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Executive  Summary 


Experiments  1  through  12  were  reported  as  the  Interim  Technical  Report  on  February  28,  2001.  Experi¬ 
ments  13  through  19  were  added  on  June  28,  2001. 


Experiment  1:  Combat  Modeling  and  Validation 

The  purpose  of  this  experiment  is  to  validate  the  low-order  ordinary  differential  equation  (ODE)  models, 
which  are  derived  to  approximate  the  evolution  of  expected  values  in  a  more  realistic  hybrid-stochastic 
model,  called  the  Probabilistic  Mission  Dynamics  Model  (PMDM),  under  different  assumptions  for  target 
acquisition  and  target  selection  coordination. 

The  hypothesis  is  that  the  evolution  of  the  expected  values  of  the  Markov  chain  (MC)  mission  dynamics 
can  be  approximated  by  a  low-order  ordinary  differential  equation  (ODE)  model,  for  a  time  period  of 
sufficient  duration,  when  the  control  signals  are  generated  in  an  open-loop  setting. 

This  study  first  identifies  four  different  sets  of  assumptions  about  target  acquisition  and  target  selection 
coordination,  which  will  be  abbreviated  as: 

MARI  (Acquisition  Rate  Independent)  Uncoordinated  target  selection,  independent  target  acquisition, 
MARD  (Acquisition  Rate  Dependent)  Uncoordinated  target  selection,  linear  target  acquisition, 
MNWA  (No  Wrap-around)  Coordinated  target  selection,  without  wrap-around, 

MWWA  (With  Wrap-around)  Coordinated  target  selection,  with  wrap-around. 

Then,  continuous-transition  Markov  chain  (MC)  models  are  developed  under  these  assumptions.  Using 
the  MC  models,  the  probability  distributions  for  the  number  of  platforms  and  their  exact  expected  values 
t]B  ,t}r  are  calculated  as  a  function  of  time.  Next,  approximate  ODE  models  of  the  evolution  of  the 
expected  values  are  derived. 

The  trajectories  of  the  ODE  models  ( fjB,fjR )  are  compared  with  the  exact  outcomes  {r]B  yrjR)  in 
twelve  experiments,  summarized  in  the  table  below,  which  indicates  the  initial  number  of  platforms  and 
the  probability  of  kill  for  the  Blue  and  Red  units. 


Scenarios  Used  In  the  Experiments 


Scenario  Types;  NB  —  8 ,PB  =  0.8 

A 

to 

II 

II 

B 

Nb  =  2Nr,  Pg  =  PfcR 

c 

Nb  =  Nr ,  PfcB  =  2Pr 

The  approximation  quality  of  the  ODE  models  is  compared  in  the  following  table  using  the  L2  norm: 


1 


l2 


Exp.  # 

1 

2 

3 

4  5  6 

7 

8 

9 

10 

11 

12 

MARI 

MARD 

MNWA 

MWWA 

\\vB-fiB\\ 

0.14 

0.37 

0.01 

0.06 

0.03 

0.02 

1.35 

5.80 

0.02 

1.53 

4.74 

0.16 

\\vR-vR\\ 

0.14 

1.35 

0.25 

0.06 

0.06 

0.04 

1.35 

5.80 

0.04 

1.53 

0.06 

0.58 

\\vB  -  vB 

+VR  -  »7fi|| 

0.28 

1.72 

0.26 

0.12 

0.09 

0.06 

2.70 

11.60 

0.06 

3.06 

4.80 

0.74 

It  was  observed  that  the  ODE  models  were  good  approximations  under  the  uncoordinated  target 
selection  assumption,  and  were  found  to  be  sufficient  to  represent  the  attrition  dynamics  in  a  differential 
game  setup  in  this  case.  The  discrepancy  between  the  MC  and  ODE  models  increase  as  the  engagement 
proceeds,  which  should  be  expected.  Under  the  coordinated  target  selection  assumption,  the  ODE  ap¬ 
proximations  were  worse.  This  can  be  partially  explained  by  the  fact  that  coordination  implies  firing  in 
rounds,  and  therefore  platform  loss  is  more  discrete  in  nature. 


Experiment  2:  Controller  Performance  Comparison  with 
Other  Controllers 

This  is  experiment  for  hypothesis  two.  Both  the  plant  and  internal  models  are  the  same,  i.e.,  the 
Mission  Dynamics  Continuous- time  Model  (MDCM).  There  is  no  noise  added  to  the  state  variables  when 
constructing  the  observed  state  variables  (the  output  variables).  The  control  actions  of  the  Blue  and 
Red  teams  are  generated  by  one  of  the  following  strategies:  the  proposed  game  theoretic  algorithm,  a 
simple  heuristic  stochastic  strategy  (e.g.  a  movement  bias  is  given  toward  targets),  a  simple  heuristic 
deterministic  strategy,  and  a  human  planner. 

The  strategy  adopted  by  Blue  and  Red  is  optimal  with  sense  of  a  Nash  equilibrium  with  respect  to 
the  value  function;  that  is,  it  maximizes  the  value  function  with  respect  to  Red  and  minimizes  it  with 
respect  to  Blue. 


Experiment  3:  Controller  Performance  under  Noise  in  the 
State  Observation 

We  performed  a  series  of  experiments  to  evaluate  the  effectiveness  of  the  current  differential  game  technol¬ 
ogy  as  a  means  of  countering  enemy  actions  under  idealized  situations  with  perfect  information  about  the 
enemy  initial  conditions  and  objectives,  but  with  noisy  measurements  of  the  enemy  state.  Our  main  find¬ 
ings  are  that  while  average  values  look  good,  individual  sample  paths  might  be  quite  surprising.  One  can 
conclude  that  the  game  theoretic  controller  CPC  (Controller-Plant-Controller)  is  sensitive  to  observation 
noise.  The  first  step  to  remedy  the  noise  problem  is  to  implement  proper  filters  in  the  controllers. 


Experiment  4:  Controller  Performance  under  Parameter 
Variations 

The  purpose  is  to  test  how  the  Controller-Plant-Controller  setup  (CPC)  will  react  to  parameter  mis¬ 
matches  between  the  battlefield  and  the  sides  as  well  as  to  parameter  mismatches  between  the  sides. 
Assuming  that  both  sides  have  chosen  the  game  theory  as  the  intelligence  behind  their  controllers,  sys¬ 
tematic  tests  have  been  performed  to  investigate  its  sensitivity,  i.e.,  how  strongly  the  proposed  game- 
theoretic  controller  reacts  to  changes  in  the  parameters.  The  important  conclusion  to  draw  from  these 
experiments  is  that  even  a  single  parameter  can  have  important  effects  on  the  outcome  of  the  battle.  It 
is  therefore  very  important  to  be  able  to  estimate  the  enemy  parameters  in  order  to  succeed  in  the  battle 
simulation. 
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Experiment  5:  Controller  Computational  Complexity 

The  purpose  of  experiment  5  is  to  test  Hypothesis  5:  The  computational  complexity  of  the  differential 
game  technology  based  controller,  combined  with  an  extended  Kalman  filter  or  a  nonlinear  observer, 
increases  quadrat ically  as  a  function  of  the  number  of  units  and  linearly  as  a  function  of  the  mission 
duration. 

A  number  of  experiments  have  been  performed  to  test  that  Hypothesis. 

In  the  set  of  experiments,  both  the  plant  and  internal  models  are  the  same,  given  by  MDCM.  In  a 
first  set  of  experiments  we  increase  the  number  of  units  in  the  scenario  while  the  mission  objectives  and 
duration  are  kept  constant.  In  a  second  set,  the  mission  duration  is  increased,  while  the  mission  objectives 
and  the  number  of  units  are  kept  constant.  The  computation  time  and  the  number  of  iterations  required 
for  the  computation  of  the  control  law  to  converge  were  recorded  in  both  cases. 

Our  main  conclusions  are  that  the  computational  time  required  to  reach  the  convergence  criterion 
depends  on  many  factors,  such  as  the  units  categories,  the  number  of  units,  initial  trajectories,  weights  in 
the  cost  function,  step  size  in  our  numerical  procedure  and  the  manner  of  engagements  as  well  as  initial 
positions  and  target  locations.  Similarly  the  number  of  iterations  required  to  reach  a  convergence  criterion 
depends  on  the  same  factors.  From  our  experimental  results,  major  factors  which  affect  the  computational 
time  are  the  number  of  units  and  mission  duration.  As  expected  from  theoretical  considerations  the 
computational  time  of  the  controller  increased  quadratically  as  a  function  of  the  number  of  units.  We 
also  saw  that  it  increased  linearly  as  a  function  of  the  mission  duration,  while  the  number  of  iterations 
remained  relatively  constant  as  a  function  of  the  number  of  units. 


Experiment  6:  Controller  with  a  Kalman  Filter  for  Estima¬ 
tion 

In  this  chapter,  we  present  an  algorithm  based  on  the  Extended  Kalman  Filter  (EKF)  for  state  estimation 
when  enemy  inputs  are  unavailable.  We  show  the  overall  structure  of  the  estimation  scheme  through  a 
block  diagram.  We  present  the  implementation  of  the  algorithm  for  the  air  operation  theater  through  a 
flowchart.  We  also  present  the  results  of  simulation  experiments. 


Experiment  7:  Controller  Applied  to  a  More  Realistic  Plant 

The  purpose  of  this  experiment  is  to  observe  the  effect  of  the  discrepancy  between  internal  and  plant 
models  in  a  closed-loop  setting.  The  internal  model  is  a  reduced-order  ODE  model,  called  the  Mis¬ 
sion  Dynamics  Continuous-time  Model  (MDCM  3.0),  and  the  plant  model  is  a  full  order  ODE  model, 
abbreviated  as  EPMDM,  which  exactly  describes  the  evolution  of  expected  values  in  PMDM. 

The  hypothesis  is  that  the  current  differential  game  technology  would  provide  an  effective  means 
of  countering  the  enemy  actions,  who  may  be  either  following  the  Nash  solution  or  using  some  simple 
heuristic  strategy,  when  noise-free  state  measurements  are  available,  in  spite  of  the  mismatch  between 
the  plant  and  the  internal  models. 

It  is  concluded  that  approximating  the  plant  model  with  a  lower  order  internal  model  does  not  cause  a 
significant  difference  in  game  results,  as  long  as  the  engagement  terminates  before  one  side  is  completely 
wiped-off. 


Experiment  8:  All  Quadratic  Method  for  Nash  Computa¬ 
tion 

The  purpose  of  Experiment  8  is  to  develop,  implement  and  test  the  Sequential  Quadratic-Quadratic 
Method  (SQQM)  for  Differential  Games,  with  two  hypotheses  of  interest.  The  first  hypothesis  tests 
whether  the  Nash  solution  computed  through  the  Sequential  Quadratic-Quadratic  Method  is  identical 
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to  the  one  found  using  the  Sequential  Linear-Quadratic  Algorithm  (SLQM);  the  second  hypothesis  tests 
whether  there  is  an  improvement  in  convergence  time.  The  issue  of  speed  can  become  of  great  importance 
in  real  time  applications;  moreover,  due  to  the  presence  of  nonlinearity  and  constraints,  a  different 
approach  serves  the  purpose  of  validating  previous  results.  The  algorithm  is  based  on  an  iterative  method 
for  computing  a  Nash  solution  to  a  zero-sum  differential  game  with  a  system  of  nonlinear  differential 
equations. 

Several  experiments  on  different  scenarios,  based  on  both  Model  2  and  Model  3,  have  shown  the 
convergence  of  the  outputs  of  the  SQQM  and  SLQM  algorithms  to  the  same  solution.  So  the  first 
hypothesis  of  Experiment  8  is  proven  true.  As  for  the  second  hypothesis,  namely  an  improvement  in 
convergence  speed,  the  conclusion  is  that  the  SQQM  alone  proves  to  be  fast  in  simple  scenarios;  if, 
however,  the  starting  trajectory  and  costate  estimates  are  too  far  from  the  optimal  solution,  the  SLQM 
may  be  used  at  first,  and  then  switch  to  the  SQQM  once  the  solution  estimate  is  closer  to  the  optimal 
solution.  In  more  complex  cases,  it  is  thus  advantageous  to  blend  the  linear-quadratic  algorithm  and 
the  quadratic-quadratic  algorithm,  taking  advantage  of  both  the  superior  stability  of  the  SLQM  and  the 
superior  speed  of  the  SQQM. 


Experiment  9:  Detector  Performance  under  Noise 

In  this  Chapter  we  report  the  experiments  performed  to  test  the  effectiveness  of  a  newly  designed  “game- 
theoretic-optimal”  detection  filter  in  handling  noise-corrupted  observations  of  the  battlefield.  The  basic 
purpose  of  the  detection  filter  is  to  reveal  the  occurrence  of  an  “engagement  action”  from  enemy  units  by 
monitoring  only  variables  associated  with  the  friendly  units.  The  game-theoretic  approach  to  the  design 
of  the  filter  makes  it  possible  to  attenuate  the  effects  of  measurement  noises,  but  not  the  effects  of  the 
action  to  be  detected.  The  outcome  of  the  experiments  shows  very  clearly  that  the  game-theoretic  filter 
is  very  effective  under  different  situations  of  noise  and  compares  very  favorably  with  a  filter  designed  on 
the  basis  of  classical  state-estimation  methods. 


Experiment  10:  Detector  Performance  under  Parameter 
Variations 

This  Chapter  describes  the  experiment  results  regarding  the  game-theoretic  detection  filter  under  parar 
metric  uncertainty.  The  exact  values  of  the  parameters  in  the  mathematical  model  of  the  battlefield  are 
not  known,  and  only  a  nominal  value  is  available.  The  filter,  whose  objective  is  to  reveal  the  occurrence 
of  an  “engagement  action”  from  enemy  units,  is  designed  on  the  basis  of  the  nominal  value.  This  set  of 
experiments  shows  that  the  game- theoretic  detection  filter,  although  proven  to  be  effective  in  the  selective 
attenuation  of  measurement  noise,  is  relatively  sensitive  to  the  uncertainty  in  the  parameters. 


Experiment  11:  Method  of  Characteristics 

The  purpose  is  to  verify  that  the  solution  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM) 
is  the  same  as  the  Nash  solution  computed  by  the  Method  of  Characteristics.  We  verified  that  the  solu¬ 
tions  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM)  are  the  same  as  the  Nash  solutions 
computed  by  the  Method  of  Characteristics  under  several  scenarios.  Also,  systematic  tests  have  been  per¬ 
formed  to  study  robustness  under  two  ways  of  enforcing  constraints:  penalties  and  explicit  enforcement. 
Specifically,  weights  for  velocities,  engagement  intensities,  final  numbers  of  platforms  and  targets,  as  well 
as  maximum  rated  speeds  have  been  varied.  The  results  show  that  the  trajectories  are  quite  similar  in 
shape. 
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Experiment  12:  Game  Flow  Model 


The  purpose  of  this  experiment  was  to  validate  the  Game  Flow  approach.  Validation  is  meant  in  the 
sense  that  the  game  theoretic  solution  engine  (i.e.,  the  Sequential  Linear —  Quadratic  algorithm),  acting 
on  the  Game  Flow  model,  converges  to  a  Nash  solution  that  generally  improves  the  value  of  the  payoff 
function. 

The  Game  Flow  model  simulates  a  two-force  game  where  the  assets  of  each  force,  say  the  blue  or  red 
forces,  are  distributed  over  a  large  geographical  area. 

In  this  experiment,  the  game  area  was  a  square  divided  into  64  square  cells.  At  the  start  of  the  game, 
the  two  forces  were  spread  uniformly  over  the  entire  game  area,  but  the  total  strength  of  the  blue  force 
was  only  two  thirds  the  total  strength  of  the  red  force.  To  counter  this  mismatch,  the  attack  range  of  the 
blue  force  was  larger,  and  the  cost  of  movement  for  the  blue  force  was  lower  than  that  of  the  red  force. 

The  goal  of  each  force  was  to  reach  the  end  of  the  game  with  a  minimum  loss  of  their  own  strength, 
while  inflicting  maximum  damage  to  the  opposing  force.  Also,  each  force  assigned  more  value  (larger 
weight)  to  the  cells  located  in  the  middle  of  the  game  area  than  to  the  cells  located  near  the  boundaries,  so 
higher  score  might  be  earned  by  finishing  the  game  with  heavier  strength  concentration  in  more  valuable 
cells.  Finally,  movement  of  assets  across  the  game  area  was  penalized,  so  economy  of  movement  was  also 
reflected  in  the  final  score  of  each  force. 

The  game  was  carried  out  for  a  specified  amount  of  time,  with  the  phases  of  the  game,  i.e.,  asset 
movement  and  attack,  evolving  uninterrupted  for  the  duration  of  the  game. 

The  SLQ  algorithm  was  used  to  find  a  Nash  equilibrium  solution  for  the  game.  In  this  experiment, 
the  solver  was  stopped  after  10  iterations,  when  the  error  (i.e.,  the  norm  of  the  velocity  updates)  was 
approximately  one  percent  of  the  original  error.  At  this  error  level,  further  iterations  had  an  insignificant 
effect  on  the  solution. 

Experimental  results  show  that  the  Nash  equilibrium  solution  found  by  the  SLQ  algorithm,  greatly 
improved  the  performance  of  the  two  forces  with  respect  to  the  value  of  the  payoff  function  selected  for 
this  experiment. 

Qualitatively  speaking,  we  can  say  that,  in  this  scenario,  the  superiority  of  the  blue  force  in  the  attack 
range,  and  its  lower  cost  on  movement  prevailed,  allowing  the  blue  force  to  keep  the  red  force  out  of  the 
most  valuable  cells  in  the  middle  of  the  game  area. 


Experiment  13:  Discrete  Platform  Dynamics 


In  any  type  of  battlefield,  the  loss  of  platforms  is  usually  a  stochastic  discrete  event  over  time.  In 
JFACC  simulations,  however,  the  number  of  platforms  has  been  modeled  as  a  real  number  representing 
its  probabilistic  expectation.  In  other  words,  our  game-theoretic  controller  based  on  an  expected-value 
model  needs  to  be  tested  on  a  more  realistic  plant,  in  which  the  numbers  of  platforms  are  integers.  Our 
approach  was  to  first  develop  a  model  such  that  the  dynamics  of  the  number  of  platforms  is  a  stochastic 
discrete-event  equation,  i.e.,  in  our  new  stochastic  discrete-event  model,  the  number  of  platforms  in  each 
unit  is  an  integer.  Hence,  the  number  of  platforms  changes  from  10  to  9  at  one  point  and  then  on  to 
eight  platforms  later  based  on  the  probability  of  kill.  Moreover,  the  numbers  of  platforms  vary  differently 
for  different  runs  due  to  random  number  generators,  which  control  the  time  when  an  actual  kill  occurs. 
Using  this  new  model,  we  conducted  multiple  runs  and  took  an  average.  This  average  was  then  compared 
against  the  results  based  on  the  expected-value  model.  We  concluded  that  our  game-theoretical  controller 
(based  on  the  simpler  expected-value  model)  performed  just  as  well  when  tested  on  this  more  realistic 
stochastic  discrete-event  plant  model  as  when  tested  on  the  expected-value  plant  model. 


Experiment  14:  Non-linear  Detector  for  the  Fully  Non¬ 
linear  Model 


In  Chapters  9  and  10  we  have  reported  the  results  of  experiments  performed  to  test  the  effectiveness 
of  a  “game-theoretic-optimal”  detection  filter  to  process  noise-corrupted  observations  of  the  battlefield. 
In  those  series  of  experiments,  a  bilinear  approximation  of  the  non-linear  model  of  the  battlefield  was 
considered  and  the  filter  was  designed  accordingly.  When  the  fully  non-linear  model  of  the  battlefield 
is  considered,  a  different  (non-linear)  detection  filter  must  be  designed.  The  purpose  of  this  Chapter 
is  to  present  the  experimental  results  concerning  the  non-linear  filter  and  to  compare  them  with  those 
obtained  by  using  the  detection  filter  designed  on  the  basis  of  the  bilinear  model  of  the  battlefield.  For 
the  sake  of  simplicity,  the  case  of  noise- free  measurements  will  be  considered  in  this  series  of  experiments. 


Experiment  15:  Comparison  with  Honeywell’s  Results 

A  comparison  of  the  platform  loss  and  probability  of  success  values  is  made  between  Washington  Uni¬ 
versity  and  Honeywell  results  on  two  example  missions,  each  consisting  of  three  sorties.  The  results  are 
similar  in  the  first  example.  Due  to  a  change  in  the  initial  number  of  Red  fighters  and  their  probability 
of  kill,  the  outcome  of  the  second  example  is  drastically  different.  It  has  also  been  observed  that  the 
selection  of  weights  in  the  cost  function  may  affect  the  unit  trajectories  and  platform  loss  significantly. 

Despite  running  our  Sequential  Linear-Quadratic  Method  for  50  iterations  or  more,  convergence  to  a 
possible  Nash  solution  was  not  achieved  in  either  example,  although  the  obtained  unit  trajectories  and 
platform  loss  numbers  were  reasonable,  given  the  mission  objectives. 


Experiment  16:  Controller  Computational  Complexity:  Cor¬ 
rection 

The  purpose  of  Experiment  16  is  to  correct  an  error  present  in  the  subprogram  that  evaluates  the  Jacobian 
of  the  model  MDCM.  This  error  would  have  affected  the  results  in  cases  in  which  multiple  units  are 
deployed  against  multiple  units  and  some  units  are  not  fired  upon.  This  error  affects  only  one  such  case 
in  the  Interim  Report  (experiments  1  through  12),  that  is  experiment  5.3.2.  Therefore,  a  corrected  version 
of  the  subprogram  for  computing  the  Jacobian  has  been  developed,  and  corrected  computational  results 
are  reported  in  this  chapter.  Even  with  this  change  we  can  draw  the  same  conclusions  as  in  Experiment 
5;  namely,  the  computational  time  is  a  quadratic  function  of  the  number  of  units. 

Experiment  17:  Controller  with  a  Kalman  Filter  for  Esti¬ 
mation 

In  this  chapter,  we  present  how  an  algorithm  based  on  the  Extended  Kalman  Filter  (EKF)  for  state 
estimation  is  used  in  a  differential  game,  which  models  the  air  operations  of  two  opposing  forces.  We 
show  the  overall  structure  of  the  game  in  a  block  diagram.  We  present  the  implementation  of  the  algorithm 
in  a  flowchart.  We  also  present  simulation  results. 

In  an  air  operation  game,  it  is  reasonable  to  assume  that  one  does  not  get  direct  information  about 
his  enemy’s  input.  In  this  paper,  we  present  an  approach  for  estimating  the  states  of  the  friendly  as  well 
as  enemy  forces  and  compare  their  respective  simulation  results.  The  Kalman  filter  due  to  Darouach  et 
al.  treats  the  enemy  inputs  as  part  of  the  extended  state  and  obtains  an  estimate  of  both  the  state  of 
the  two  forces  and  the  input  of  the  enemy.  But  their  filter  is  designed  for  linear  time-invariant  systems. 
Hence,  we  present  an  extension  of  their  filter  to  a  nonlinear  time- variant  system. 

The  extended  Kalman  filter  algorithm  presented  in  this  report  is  capable  of  estimating  the  states  of 
both  forces  in  the  presence  of  process  noise  as  well  as  sensor  noise.  We  note  that  the  estimates  of  the 
enemy  inputs  are  too  noisy  to  be  directly  useful.  However,  our  game-theoretic  controller  requires  only  an 
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estimate  of  the  enemy  state  and  it  does  not  require  any  estimates  of  the  enemy  input.  We  thus  observed 
the  game- theoretic  controller  remained  effective  when  the  extended  Kalman  filter  is  introduced  in  the 
loop. 


Experiment  18:  Method  of  Characteristics:  Addendum 

The  purpose  of  Experiment  11  was  to  verify  that  the  solution  computed  by  the  Sequential  Linear- 
Quadratic  Method  (SLQM)  was  the  same  as  the  Nash  solution  computed  by  the  Method  of  Charac¬ 
teristics.  We  verified  that  the  solutions  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM) 
were  indeed  the  same  as  the  Nash  solutions  computed  by  the  Method  of  Characteristics  under  several 
scenarios.  However,  the  experiments  in  Chapter  11  all  involved  one  Blue  unit  against  one  Red  unit.  In 
Experiment  18,  we  extend  the  results  in  Experiment  11  to  a  scenario  of  multi-units  against  multi-units. 
Specifically,  Experiment  18  tests  the  Method  of  Characteristics  for  the  case  of  three  blue  units  against 
three  red  units. 


Experiment  19:  New  Game  Flow  Models 

Military  operations  can  be  viewed  as  a  hierarchical  structure  in  which  actions  are  taken  by  individual 
units  at  a  low  level,  based  on  strategies  developed  by  planners  at  a  high  level.  In  this  experiment  we 
consider  the  situation  in  which  two  forces,  say  the  blue  and  red  forces,  control  a  large  number  of  units 
distributed  over  a  large  geographical  area.  We  develop  a  tool  that  is  useful  to  high-level  planners  in 
simulating  and  computing  the  optimal  strategy  for  the  two  forces.  We  also  report  the  results  of  our 
numerical  experiments. 

The  geographical  area  in  our  model  is  represented  by  an  abstract  game  board  that  is  divided  into 
cells  so  that  the  strength  concentration  of  the  blue  (resp.  red)  force  in  a  cell  is  defined  as  the  number  of 
blue  (resp.  red)  units  contained  in  the  cell  divided  by  the  area  of  the  cell.  The  game  is  concurrent  in  the 
sense  that  both  the  blue  and  red  forces  can  move  some  or  all  of  their  respective  units  simultaneously  and 
continuously  during  the  game. 

We  formulated  the  military  operation  control  problem  as  a  differential  game  over  the  abstract  game 
board.  The  differential  game  consists  of  a  quadratic  payoff  function  and  a  set  of  ordinary  differential 
equations  describing  the  system  dynamics  of  the  unit  distribution  over  the  discritized  geographical  area 
(the  abstract  game  board). 

In  order  to  solve  such  a  geographically  distributed  differential  game,  we  developed  a  computer  method 
for  finding  a  local  Nash  solution  to  the  adversarial  game.  The  optimum  strategy  for  each  team  is  found 
using  the  iterative  algorithm  called  Sequential  Linear-Quadratic  Method.  Experimental  results  are  also 
presented  that  demonstrate  the  validity  of  this  concept. 
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Chapter  1 

Experiment  1:  Combat  Modeling 
and  Validation 

1.1  Executive  Summary 

The  purpose  of  this  experiment  is  to  validate  the  low-order  ordinary  differential  equation  (ODE)  models, 
which  are  derived  to  approximate  the  evolution  of  expected  values  in  a  more  realistic  hybrid-stochastic 
model,  called  the  Probabilistic  Mission  Dynamics  Model  (PMDM),  under  different  assumptions  for  target 
acquisition  and  target  selection  coordination. 

The  hypothesis  is  that  the  evolution  of  the  expected  values  of  the  Markov  chain  (MC)  mission  dynamics 
can  be  approximated  by  a  low-order  ordinary  differential  equation  (ODE)  model,  for  a  time  period  of 
sufficient  duration,  when  the  control  signals  are  generated  in  an  open-loop  setting. 

This  study  first  identifies  four  different  sets  of  assumptions  about  target  acquisition  and  target  selection 
coordination,  which  will  be  abbreviated  as: 

MARI  (Acquisition  Rate  Independent)  Uncoordinated  target  selection,  independent  target  acquisition, 

MARD  (Acquisition  Rate  Dependent)  Uncoordinated  target  selection,  linear  target  acquisition, 

MNWA  (No  Wrap-around)  Coordinated  target  selection,  without  wrap-around, 

MWWA  (With  Wrap-around)  Coordinated  target  selection,  with  wrap-around. 

Then,  continuous-transition  Markov  chain  (MC)  models  are  developed  under  these  assumptions.  Using 
the  MC  models,  the  probability  distributions  for  the  number  of  platforms  and  their  exact  expected  values 
rjB,7jR  are  calculated  as  a  function  of  time.  Next,  approximate  ODE  models  of  the  evolution  of  the 
expected  values  are  derived. 

The  trajectories  of  the  ODE  models  {f)B,rjR)  are  compared  with  the  exact  outcomes  (r]B,r]R)  in 
twelve  experiments,  summarized  in  the  table  below,  which  indicates  the  initial  number  of  platforms  and 
the  probability  of  kill  for  the  Blue  and  Red  units. 


Scenarios  Used  In  the  Experiments 


Scenario  Types;  NB  =  8,  PR  =0.8 

A 

Nb  =  Nr,  Pi 3  =  Pf 

B 

Nb  =  2 Nr,  PfcB  =  PfcR 

c 

Nb  =  Nr,  P f  =  2Pfc* 

The  approximation  quality  of  the  ODE  models  is  compared  in  the  following  table  using  the  L2  norm: 
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0, 10]-Norm  of  Error  Between  Actual  arid  Approximate  Expected  Values 


Exp.  # 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

MARI 

MARD 

MNWA 

MWWA 

0.14 

0.37 

0.01 

0.06 

0.03 

0.02 

1.35 

5.80 

0.02 

1.53 

4.74 

0.16 

hR 

0.14 

1.35 

0.25 

0.06 

0.06 

0.04 

1.35 

5.80 

0.04 

1.53 

0.06 

0.58 

hB-vB 

+^-^11 

0.28 

1.72 

0.26 

0.12 

0.09 

0.06 

2.70 

11.60 

0.06 

3.06 

4.80 

0.74 

It  was  observed  that  the  ODE  models  were  good  approximations  under  the  uncoordinated  target 
selection  assumption,  and  were  found  to  be  sufficient  to  represent  the  attrition  dynamics  in  a  differential 
game  setup  in  this  case.  The  discrepancy  between  the  MC  and  ODE  models  increase  as  the  engagement 
proceeds,  which  should  be  expected.  Under  the  coordinated  target  selection  assumption,  the  ODE  ap¬ 
proximations  were  worse.  This  can  be  partially  explained  by  the  fact  that  coordination  implies  firing  in 
rounds,  and  therefore  platform  loss  is  more  discrete  in  nature. 


1.2  Purpose  of  the  Experiment 

The  purpose  of  this  experiment  is  to  validate  the  low-order  ordinary  differential  equation  (ODE)  models, 
which  are  derived  to  approximate  the  evolution  of  expected  values  in  a  more  realistic  hybrid-stochastic 
model,  called  the  Probabilistic  Mission  Dynamics  Model  (PMDM),  under  different  assumptions  for  target 
acquisition  and  target  selection  coordination. 


1.3  Hypothesis  to  Prove  or  Disprove 

The  hypothesis  is  that  the  evolution  of  the  expected  values  of  the  Markov  chain  (MC)  mission  dynamics 
can  be  approximated  by  a  low-order  ordinary  differential  equation  (ODE)  model,  for  a  time  period  of 
sufficient  duration,  when  the  control  signals  are  generated  in  an  open-loop  setting.  For  the  closed-loop 
situation,  see  Chapter  7. 


1.4  Introduction 

We  consider  a  geographical  area,  a  theater  of  air  operations,  in  which  two  forces  oppose  each  other  and 
try  to  accomplish  their  respective  mutually  conflicting  air  missions.  For  example,  two  forces  may  be 
operating  in  an  area,  in  which  the  ground  force  of  one  side  tries  to  invade  the  other  side  while  the  air 
force  of  the  other  side  tries  to  stop  the  invasion. 

Attrition  dynamics  are  inherently  probabilistic,  yet  most  attrition  models  approximate  the  dynamics 
with  a  deterministic  system.  This  chapter  shows  the  predictive  capabilities  of  a  probabilistic  model 
using  Markov  Chains,  and  illustrates  different  methods  to  approximate  the  probabilistic  model  with  a 
deterministic  model. 

In  this  chapter  we  study  the  combat  between  one  Blue  unit  and  one  Red  unit.  Modeling  multiple 
units  is  more  involved  and  may  be  the  subject  of  future  work.  (A  multi- unit  model,  based  on  heuristic 
arguments,  is  included  as  an  appendix  to  this  chapter.)  Each  unit  is  homogeneous,  that  is,  each  unit 
consists  of  only  one  type  of  platform  equipped  with  the  same  type  of  weapons.  Platforms  can  be  SEADS, 
ground  troops,  bombers,  interceptors,  etc.  In  the  scenario  we  would  have  data  consisting  of  such  ideas  as 
the  probability  of  destroying  a  targeted  platform,  firing  intensities  and  speed  controls.  Firing  intensity  is 
a  control  which  the  mission  commander  (controller)  may  change,  in  a  dynamic  game  situation. 

The  model  outcome  is  mainly  concerned  with  the  expected  values  of  the  number  platforms  for  each 
unit  at  the  end  of  a  mission.  We  have  presented  approximations  to  these  expected  values  that  are 
deterministic  ODE’s.  Modeling  has  two  different  cases:  uncoordinated  target  selection  (any  platform 
that  acquires  a  target  may  fire  a  weapon)  and  coordinated  target  selection  (the  unit  commander  controls 
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how  the  platforms  target).  We  will  see  that  some  of  these  approximations  are  very  good,  and  some  are 
not. 

The  chapter  is  organized  as  follows:  a  presentation  of  the  general  background  to  the  model,  a  derivation 
of  the  model  for  uncoordinated  target  selection,  a  derivation  of  the  model  for  coordinated  target  selection, 
the  evolution  of  the  expected  number  of  platforms  for  both  uncoordinated  and  coordinated  cases  with 
their  approximations,  numerical  experiments,  and  some  ideas  of  what  to  do  next.  An  abridged  version 
has  been  published  as  [1]. 

1.4.1  Notation 

We  will  denote  the  quantities  which  belong  to  the  forces  Blue  and  Red  with  superscripts  B  and  . R .  If 
the  symbols  in  an  expression  do  not  have  superscripts,  this  is  intended  to  mean  that  the  expression  is 
valid  for  either  force. 

The  assumptions  which  appear  to  be  reasonable  for  the  situation  we  are  modeling  will  be  preceded 
by  M. . . .  Simplifying  assumptions,  which  are  introduced  for  the  sake  of  mathematical  convenience,  but 
may  not  always  hold,  will  be  preceded  with  S. . . .  Postulates,  which  we  believe  to  reflect  the  “nature  of 
things”,  and  should  be  satisfied  with  rare  exceptions,  will  be  preceded  with  P. . . . 


1.5  Random  Variables 

The  position  of  the  units  are  £ B  and  which  are  deterministic  quantities.  On  the  other  hand,  the 
number  of  platforms  will  depend  on  chance  occurrences.  To  capture  this,  define  the  random  variables, 

XB(t)  number  of  platforms  in  the  Blue  unit  at  time  t , 

XR(t)  d=  number  of  platforms  in  the  Red  unit  at  time  t. 

The  initial  values  are  known  to  be  XB(0)  =  NB,  XR(0)  =  NR .  At  any  given  time,  the  commanders  (or 
controllers)  can  observe  only  the  expected  values,  denoted  by 

vB(t)  d=  E[XB(t)],VR(t)  d=  E[XR(t)},  r?  d=  [r1B,VR}T. 


Each  unit  has  a  fire  intensity  control  v  €  [0, 1].  This  can  be  interpreted  as  the  frequency  of  firing  a 
weapon  per  target  acquisition  (i.e.,  the  probability  of  firing,  given  a  target  is  acquired). 

After  acquiring  a  target,  a  platform  fires  a  salvo  of  s  weapons  (with  probability  v).  This  will  decrease 
the  weapons  load  of  this  platform.  Define  the  random  variables, 

WB(t)  number  of  salvo  loads  per  platform  in  the  Blue  unit  at  time  £, 

WR(t)  d=  number  of  salvo  loads  per  platform  in  the  Red  unit  at  time  t. 

Again,  at  any  given  time,  the  commanders  (or  controllers)  can  observe  only  the  expected  values, 

C  B(t)  =f  E[wB(t)},cR(t)  =  d=f  [CB,CR]T 


Therefore  the  fire  intensity  controls  are  deterministic  quantities,  v  =  (). 

Consider  the  Blue  unit  firing  on  the  Red  unit.  Let  the  probability  of  kill  for  each  weapon,  given  it  is 
fired,  be  P{wkillB  | fired B}.  The  probability  of  killing  the  target,  with  a  salvo  of  sB  weapons,  given  that 
they  are  fired  simultaneously,  is 


PfcB  =  P{killB|firedB} 


I  [1  -  (1  -  P{wki»B|firedB})sB]  if  WB(t)  >  0, 
\  0  if  WB(t)  =  0. 


(1.1) 
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1.6  Uncoordinated  Target  Selection 


Consider  two  (homogeneous)  units  of  the  opposing  forces  engaged  in  a  battle,  in  which  platforms  of  both 
sides  are  shooting  at  each  other  simultaneously.  During  this  engagement,  each  platform  searches  for  an 
enemy  platform  (in  the  sky,  on  the  ground,  etc.).  When  a  platform  is  located  in  space,  identified  to  be 
an  enemy  and  the  weapon  system  is  locked  on  to  this  platform,  a  target  is  said  to  be  acquired.  Target 
acquisition  is  a  stochastic  process,  in  which  events  occurring  on  disjoint  intervals  of  time  are  assumed  to 
be  independent. 

The  following  assumptions  appear  to  be  reasonable  for  air-to-air  or  air-to-ground  combat  with  modem, 
electronically  guided  weapon  systems: 

MNCO:  Friendly  platforms  do  not  communicate  for  target  selection  (uncoordinated  selection). 
The  exception  is  when  a  platform,  which  has  depleted  its  supply  of  weapons,  locates  and  identifies 
a  target.  In  that  case,  this  platform  will  relay  this  information  to  a  friendly  platform  in  the  same 
unit  which  has  weapons. 

MPKD:  The  probability  of  killing  the  target  depends  on  the  distance  between  the  units. 

MNWT:  The  time  it  takes  for  missiles,  bombs,  and  other  weapons  to  reach  the  target  is  negligible. 

From  MPKD,  (1.1)  will  depend  on  the  distance  between  units,  with  a  function  *0  :  IR  — ►  [0, 1]  which 
depends  on  the  positions  of  each  unit,  ~  £H||).  Therefore  the  probability  of  a  platform  killing  its 

target,  given  it  is  assigned  to  a  target,  is 

pB^BvB  probability  of  a  blue  platform  killing  a  red  platform  with  one  salvo, 

Pki>RvR  ~  probability  of  a  red  platform  killing  a  blue  platform  with  one  salvo. 

There  are  two  different  situations  for  acquisition  rate  and  self- attrition: 

MARI:  The  target  acquisition  rate  does  not  depend  on  the  number  of  enemy  platforms  or  their 
distribution  in  space.  (Search  devices  are  advanced  enough  that  they  will  locate  enemy  platforms 
efficiently  even  when  they  are  distributed  sparsely.) 

MNSA:  Self- attrition  or  equipment  breakdowns  are  negligible. 

and 


MARD:  The  target  acquisition  rate  does  depend  on  the  number  of  enemy  platforms. 


MWSA:  Self- attrition  or  equipment  breakdowns  are  not  negligible. 


To  satisfy  both  MARI  and  MARD,  considering  a  Blue  platform,  define  a  function,  a  : 


a 


B 


1 


m  >  crB  , 
m  <  aB 


M,  as 

(1.2) 


where  crB  is  the  value  at  which  the  Blue  platform’s  acquisition  rate  saturates.  If  aB  =  1,  we  have  an 
acquisition  rate  that  is  independent  of  the  number  of  enemy  platforms  (MARI),  otherwise  there  is  linear 
dependence  on  the  number  of  enemy  platforms  (MARD),  until  saturation  is  reached. 

Now,  consider  the  Blue  unit  firing  at  the  Red  unit.  Let  aB  be  the  maximum  rate  a  Blue  platform 
acquires  a  Red  platform  as  a  target.  Given  XB(t)  =  n  and  XR(t)  =  m,  from  MPKD  with  Equations 
(1.1)  and  (1.2),  the  loss  rate  of  Red  platforms  due  to  one  Blue  platform’s  fire  is 


aB(m) \R,  with  XR  d=  P^B{\\iB  -ZR]\)aBvB. 

From  MNCO,  the  loss  rate  for  the  Red  unit  due  to  all  platforms  of  the  Blue  unit  is  aB(m)XR(t)n.  The 
loss  rate  for  the  Blue  unit  is  aR(n)\B(t)m,  from  symmetry. 

Next,  when  self-attrition  is  not  negligible,  MWSA,  we  may  define  an  independent  process  describing 
the  self-attrition  rate  for  a  platform,  0B  for  the  Blue  unit  platforms  and  0R  for  the  Red  unit  platforms. 
Then  the  self-attrition  rate  due  to  all  platforms  in  the  Blue  unit  is  n0B ,  and  the  rate  due  to  all  platforms 
in  the  Red  unit  is  m/3R.  When  self-attrition  is  negligible,  MNSA,  then  0  =  0. 
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1.6.1  Defining  the  Markov  Chain 

Under  the  previous  assumptions,  we  can  postulate  a  two-dimensional  non-homogeneous  continuous-time 
Markov  Chain  for  the  platform  dynamics,  with  state  space  {0, . . . ,  NB}  x  {0, . . . ,  NR}1  by  specifying  the 
state  transition  probabilities  from  time  t  to  time  t  +  h,  where  h  is  small: 

PDI:  The  number  of  losses  in  disjoint  intervals  are  independent. 

PKB:  P{exactly  one  Blue  killed}  = 

P{XB(t  +  h)  =  n  -  1,  XR{t  +  h)  =  m  |  XB(t)  =  n,  XR(t)  =  m}  = 

(rnaR(n)XB(t)  H~  nf3B)h  +  o(h) 


PKR:  P{exactly  one  Red  killed}  = 

P{XB{t  +  h)  =  n,XR(t  +  h)  ~m-l  \  XB(t)  =  n,XR(t)  -  m}  = 

(naB  (m)XR(t)  -f  m(3R)h  +  o(h) 


PMD:  P{two  or  more  deaths}  — 


f  o(/i)  n  +  m  >  2 
}  0  otherwise 


PRE:  P{ resurrection}  — 

P{XB(t  +  h)>nU  XR(t  +  h)  >  m  \  XB{t)  =  n,  XR{t)  =  m}  =  0 


PND:  It  follows  that  P{no  death}  = 

P{XB(t  +  h)  =  n,  XR(t  +  h)=m\  XB{t)  =  n,  XR(t)  =  m}  = 

1  —  (maR(n)\B  (t)  +  nPB  +  naB  (m)\R(t)  4-  m/3R)h  +  o(h). 


Using  the  standard  notation 


n„,m(f)  d=  P{XB(t)  =  n,  XR(t)  =  m},  (1.3) 

the  evolution  of  state  probabilities  are  described  by 

37 n =  -  (maR(n)\B  +  n(3B  -f  ncrB(m)\R  +  m/3R)Un  rn 
at 

+  ( moR(n  -fi  1)AB  +  (n  +  l)/3B)Iln±\iTn 
+  (ncrB  (m  +  1)XR  (m  -fi  l)/3B)ITn)m+i 

where  the  time  arguments  have  been  dropped  for  brevity.  If  we  stack  all  components  of  nnjTn  into  a  row 
vector  II,  with  (NB  -fi  l)(NR  -fi  1)  elements,  the  above  differential  equation  (1.4)  can  be  written 

jtW)  =  n(t)Q(f)  (i.4) 

where  Q  is  called  the  transition  rate  matrix  or  the  infinitesimal  generator  of  the  process. 
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1.7  Coordinated  Target  Selection 

In  this  case,  one  can  no  longer  talk  about  a  target  acquisition  process.  In  order  to  decide  which  enemy 
platforms  will  be  a  target,  the  unit  commander  will  need  to  know  the  number  and  type  of  platforms 
in  the  enemy  unit.  Target  selection  coordination  implies  less  independence  for  each  platform.  Now  the 
platform  commanders  are  only  executing  strict  orders,  or  what  their  training  dictates,  when  their  unit 
engages  an  enemy  unit. 

Coordination  also  implies  that  target  selection  takes  place  in  rounds,  from  the  unit  commander’s 
perspective. 

MWCO  At  the  beginnning  of  each  round  the  unit  commander  assigns  each  of  his  platforms  an 
enemy  platform  as  a  target.  During  the  round  firing  takes  place,  and  platforms  may  be  killed  on 
both  sides.  Then  the  unit  commander  assigns  targets  for  the  next  round. 

There  are  two  different  situations: 

MRTI:  The  inter- arrival  time  between  rounds  (round  length)  is  independent  of  the  number  of 
platforms,  and  independent  of  the  previous  round  length.  Both  sides  fire  simultaneously. 

MRTK:  A  new  round  begins  only  when  a  friendly  or  enemy  platform  is  killed. 

MRTI  is  appropriate  for  artillery  duels,  for  bombers  versus  ground  troops  or  bombers  versus  air  defense. 
MRTK  may  be  appropriate  for  air-to-air  combat.  In  both  cases,  the  round  length  is  the  same  for  both 
Blue  and  Red.  We  focus  only  on  MRTI  in  this  report. 

There  are  two  situations  for  target  selection:  with  wrap-around  and  without  wrap-around; 

MNWA:  (No  wrap  around)  At  the  beginning  of  a  round,  each  platform  is  assigned  to  a  unique 
target.  If  there  are  more  platforms  than  targets,  the  remaining  platforms  do  not  participate  in  the 
firing  (although  they  may  be  targeted  by  the  enemy). 

MWWA:  (With  wrap  around)  At  the  beginning  of  a  round,  each  platform  is  assigned  to  a  unique 
target,  until  all  platforms  have  assignments.  If  there  are  more  platforms  than  targets,  the  end  of 
the  target  list  wraps  around  to  the  beginning. 

As  in  the  uncoordinated  case  we  have, 

P^BvB  ~  probability  of  a  blue  platform  killing  a  red  platform  with  one  salvo, 

PBij)RvR  ~  probability  of  a  red  platform  killing  a  blue  platform  with  one  salvo. 

The  time  between  rounds  is  needed  for  the  platforms  to  reload  their  weapons,  for  bombers  to  turn 
back,  for  determination  of  losses  and  kills,  and  for  decision  making  for  the  assignments.  From  MRTI, 
arrival  of  rounds  is  a  Poisson  process  with  parameter  p  (which  may  depend  on  the  unit  category). 

1.7.1  Case  1  :  Independent  Round  Arrival,  No  Wrap-Around 

Suppose  a  round  occurs  in  (t,t  4-  h].  Given  XB (t)  —  n  and  XR(t)  =  m,  there  will  be  min(n,m)  target 
assignments  on  each  side,  and  min(n,  m)u  shots  will  be  taken.  Consider  the  number  of  losses  on  the 
Red  side.  Since  each  round  of  shots  taken  is  independent,  this  can  be  regarded  as  a  Bernoulli  trial  with 
probability  of  success,  PB/ipBvB .  Thus  the  probability  that  Red  will  lose  k  platforms,  given  the  round 
started  at  time  £,  is 

GR(ny  m,  k,  t)  P{XR(t  +  h)  —  m  —  k  |XB(i)  =  n,  XR(t)  =  m,  one  round  in  (£,  t  -f  h}} 

min(n,  m) 
k 


^  {P^BUB)k(  1  -  PB qpB^B^mm(n,m)—k 
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Note  that  this  is  valid  only  when  k  <  mi n(n,m),  so  one  should  define 


(  a  \  d_|f  {  h\(a-b)\  a  - 

\  6  /  \  0  a  <  6. 

Similarly  for  Blue  losing  Z  platforms,  given  the  round  started  at  time  t, 

GB(n,m,Z,Z)  d=  P{XB(t  +  h)  =  n  —  l  \XB(t)  =  n}XR{t )  =  m,  one  round  in  (t,t  +  h]} 

=  ^  ^  (1.6) 

Since  both  sides  fire  simultaneously,  the  number  of  losses  on  one  side  is  independent  of  the  number  of 
losses  on  the  other  side.  Thus  for  (Z,fc)  /  (0,0) 

P{XB(t  +  h)  =  n  -  Z,  XR(t  +  h)  —  m  —  k  \XB(t)  =  n,XB(Z)  =  m,  one  round  in  (t,t  4  Zi]}  = 

GB(n ,  m,  Z,  Z)GB(n,  m,  /c,  Z). 

1.7.2  Case  2  :  Independent  Round  Arrival,  With  Wrap-Around 

Suppose  a  round  occurs  in  (Z,Z  4  h].  Given  XB(t)  -  n  and  XR(t)  =  m.  The  minimum  number  of 
platforms  assigned  to  fire  at  a  Red  platform  by  the  Blue  unit  is 

fB  =  [  liU>  ™>°- 

3  \  0,  m  =  0, 

where  [»J  is  the  truncation  function.  Out  of  the  n  Blue  platforms,  mfB  of  them  will  be  assigned  regularly, 
and 

n  mod  m  —  n  —  mfB 

of  them  will  be  assigned  as  extras(the  wrap-around).  Then  n  -  m/B  Red  platforms  will  receive  fB  +  1 
shots  and  m(/B  + 1)  -  n  Red  platforms  will  receive  fB  shots,  given  that  all  Blue  platforms  choose  to  fire. 

The  number  of  Red  losses,  given  a  round  in  (t,t  +  h],  is  the  sum  of  Red  losses  in  the  group  which 
receive  fB  shots,  plus  the  Red  losses  in  the  group  which  receive  fB  - f  1  shots.  The  ‘/B’  group  has 
m(fB  +  1)  —  n  d=f  b0  Bernoulli  trials,  each  with  probability  of  success 

Bo  d=  1  -  (1  -  P?ipBvB)fB . 


The  ‘fB  4-  T  group  has  n  -  mfB  d=  bi  Bernoulli  trials,  each  with  probability  of  success 

Bl  ™  1  _  (1  -  PBTpBVB)fB  +  1 

Thus  the  probability  that  Red  will  lose  k  platforms  given  the  round  started  at  time  t  is, 

GR(n,m,k,t)  d=  P{XR{t  +  h)  =  m-k  |XB(t)  =  n,XR{t )  =  m,  one  round  in  (t,t  +  h]}  = 

ELo  (  6°  )  (Bo)’(l  -  B0)b°-i  (  6‘  )  (Ri)fc-9(1  -  fB  ±  0 

<  ’  V  «  /  v  K  q  /  (1.7) 

(  ”  )  (Bi)fc(l  -  Bi)n~k  fB=  0 

A  similar  argument  holds  for  Blue  to  find  GB(n ,  m,  Z,  Z).  Since  both  sides  fire  simultaneously,  the  number 
of  losses  on  one  side  is  independent  of  the  number  of  losses  on  the  other  side.  Thus  with  (Z,  k)  ^  (0,0), 

P{XB(t  -f-  h)  -  n  -  Z,  XR(t  +  h)  =  m-k  \XB{t)  =  nyXR(t)  =  m,  one  round  in  (Z,Z  P  h}}  = 

GB(n,  m,  Z,  t)GR(n ,  m,  k)  t). 
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1.7.3  Defining  the  Markov  Chain 

First  set  F(Z,  A;,  n,  m,  t)  —  GB(nim,l,t)GR(nymykyt),  where  GB(n1mJyt)  and  GR{n,m,k,t)  are  defined 
in  Equations  (1.5),  (1.6)  and  (1.7)  for  both  cases.  Under  the  previous  assumptions,  we  can  postulate  a 
two-dimensional  non-homogeneous  continuous- time  Markov  chain  for  the  platform  dynamics,  with  state 
space  {0, . . . ,  Nb)  x  {0, . . . ,  NR },  by  specifying  the  state  transition  probabilities  from  time  t  to  time  t  +  /i, 
where  h  is  small: 

PDI:  The  number  of  losses  in  disjoint  intervals  are  independent. 

PKBR:  (Z,fc)  #  (0,0) 

P{XB(t  +  /i)  =  n  -  l,XR{t  +  h)=m-k\  Xs (t)  =  n,  XR(t)  =  m}  - 

F(Z,  A:,  7i,  m,  t)(ph  -b  o(/i)) 

PND:  (/,  k)  =  (0, 0) 

P{XB(Z  +  /i)  =  n, XB(Z  +  /i)=m|  XB(Z)  =  n, XB(£)  =  m}  = 

(1  —  ph  +  o(/i))  +  F(0, 0,  n,  m,  £)(p/i  4-  o(h)) 

+  P {losing  1  and  k  in  2  or  more  rounds}  P{2  or  more  rounds  occur} 

^  .  . .  > 

o(h) 

Using  the  standard  notation  (1.3),  the  evolution  of  the  state  probabilities  are  described  by 

n n,m(t)  =  (F(0, 0,  n,  m, «)  -  l)plln,m  (1.8) 

F  ^  v  F(Z,  A:,  7i -b  Z,  771 -+■  A:,  i)pIIn.|-j|Tn+k 

(lffc)9t(0,0) 

where  the  time  arguments  have  been  dropped  for  brevity.  If  we  stack  all  components  of  IInim  into  a  row 
vector  II,  with  (NB  -b  l)(NR  +  1)  elements,  we  have  Equation  (1.4). 

1.8  Evolution  of  Expected  Values 

1.8.1  Uncoordinated  Target  Selection 
Independent  Target  Acquisition  (MNCO,  MARI) 

With  independent  target  acquisition,  we  have  a  —  1.  Under  this  assumption  we  may  write  the  evolution 
of  the  expected  number  of  Blue  platforms  as 

,  nb  nr 

VB(t)  =  nUn,m(t) 

71  =  0  771=0 

nr 

=  +  AB(t)  mIIo,m(£).  (1.9) 

m=l 

Similarly,  the  evolution  of  the  expected  number  of  Red  platforms  is 

nb  nr 

^(0  =  mUn,m(t) 

71  =  0  771=0 

N3 

=  -\R(t)riB(t)  +  \R(t)J2nIln,o(t)'  (1.10) 
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A  way  of  developing  an  approximation  of  the  expected  values  is  to  drop  the  extra  terms  in  (1.9)  and 
(1.10).  This  will  give  us  a  deterministic  ODE  approximating  the  expected  value  of  platforms, 

if  =  -r)RPkipRi'RaR,  (1.11) 

f!R  =  -fiBPfil)BuBaB.  (1.12) 

Dependent  Target  Acquisition  (MNCO,  MARD) 

Consider  a  short  interval,  [t,  t  -f  At].  We  then  may  assume  that  the  number  of  Blue  losses  in  the  interval 
is  approximated  by 


maR(n) \B  «  fjRaR(f}B)\B At  =  f)Rf}B  — ^\B  At. 


Then  we  may  approximate 


rjB(t  +  At)  =  j)B(t)  -  (r)RfiB ~^\B At), 

ao 


and  by  taking  the  proper  limit  as  At  — >  0,  we  get 


Similarly  we  have 


bB  =  -fiBfiR-nPRijRvR. 
ao 


ff  =  —fjRfiB—HPBipBi>B. 


1.8.2  Coordinated  Target  Selection 
No  Wrap-Around  (MWCO,  MRTI,  MNWA) 

In  each  round,  both  sides  have  mm(r}B ,t]r)  expected  shots,  so  we  would  expect  Red  to  lose 

min  {t]B  ,rf)PB  ipB  vB , 

and  expect  Blue  to  lose 

m  in  ( 7} B ,  rj R )  PR  ^ R v R , 

in  each  round.  In  a  short  interval,  At,  the  expected  number  of  rounds  is  p At.  Assuming  that  during  these 
rounds  r]B  and  rjR  do  not  change  much  (this  assumption  will  not  necessarily  hold) ,  we  may  approximate 

f)B{t  +  At)  =7?B(t)  -  pAtunn(ffB,f)R)P^RiyR. 

Then  by  taking  the  proper  limit  as  At  — ►  0,  we  obtain  the  evolution  equation  for  the  expected  number 
of  Blue  platforms, 

f)B(t)  =  -pmm(r)B  ,f)R)PRipRvR. 

If  it  is  desired  to  replace  min  (r\B  ,r)R)  by  a  smooth  function,  one  possibility  is  to  use 

min(fjB,fiR)  .  f}R  - 2 

- ff~L~L  =  min(^B .  1)  ~  1  ~  e  i* 


Then  we  obtain 


Similarly,  by  symmetry,  we  have 


2'te  nR,i,R,,R 


f,“{t)  =  -pf]B  1-e  ^  I  P£l>Kv 


i f(t)  =  -PVK  [1  -  e~2^J  P^BuB.  (1.16) 

However,  it  was  observed  that  this  replacement  results  in  poor  approximations  when  one  force  has  nu¬ 
merical  superiority. 
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With  Wrap-Around  (MNCO,  MRTI,  MWWA) 


Consider  the  Blue  unit  firing  on  the  Red  unit.  The  average  number  of  shots  Red  will  receive  per  platform 
is  ~.  In  each  round,  the  expected  Red  loss  is  the  sum  of  the  expected  loses  in  the  l/B’  and  lfB  +  T 
groups,  which  is 

[m(fB  +  1)  -  n}[  1  -  (1  -  PBi>BvByB]  +  [n  -  mfB][  1  -  (1  -  +l] 

Approximating  fB  by  ^  (the  average  number  of  shots  Red  will  receive),  the  above  equation  simplifies  to 
m[l  —  (1  —  PBipB Following  the  same  logic  as  in  the  first  case,  we  obtain 


fiR(t)  =  -PfiR 


Similarly,  by  symmetry,  we  have 


V  B(t)  =  - pfj 1 


l  -  (l  -  PfcVV*)^ 


(1.17) 


(1.18) 


1.8.3  Summary  of  ODE’s 

Table  1.1  summarizes  the  ODE’s  for  approximating  the  expected  value  of  platforms  for  each  model. 


Table  1.1:  The  Approximate  Evolution  of  Expected  Number  of  Platforms 


Case 

Approximate  Expected  Value  ODE’s 

U  ncoordinated  (MARI) 

fjB  =  —fjRPR'ipRjyRaR 

Uncoordinated(MARD) 

fjB  =  —f}BT)R^p[PR'lj)RVR 

Coordinated(MNWA) 

f}B  ~  —pi)Bm\n{fiB)T)R)PR'4)RvR 

Coordinated(MWWA) 

fjB  —  — pffB  1  -  (1  —  py-%j)RvR)^ 

1.9  Weapons  Expenditure 

When  a  platform  with  a  large  number  of  weapons  is  shot  down,  the  salvo  loads  per  platform  W(t)  will 
decrease  faster,  compared  to  the  case  when  a  platform  with  less  weapons  is  shot  down.  Therefore,  for 
an  exact  prediction  of  the  probability  distribution  of  W(t)  one  needs  to  develop  a  Markov  chain  model 
for  the  number  of  weapons  in  each  platform  individually,  and  then  calculate  the  distribution  of  W(t) 
from  these.  This  is  not  only  computationally  very  costly,  but  also  only  marginally  useful  for  the  overall 
project  objective  of  developing  a  game  theoretic  controller.  This  is  because  W(t)  does  not  enter  the  cost 
function,  and  it  becomes  significant,  per  (1.1),  only  when  a  unit  depletes  all  of  its  weapons.  Considering 
this,  we  will  take  a  shortcut  by  introducing  the  following  simplifying  assumption: 

SWES:  Since  the  target  acquisition  or  round  arrival  rate,  the  fire  intensity  controls  and  the  salvo 
sizes  are  the  same  for  all  platforms  in  a  unit,  all  platforms  will  be  assumed  to  have  the  same  number 
of  weapons  on  board  at  any  given  time. 

Because  of  SWES,  when  a  platform  is  killed,  W(t)  does  not  change.  The  salvo  loads  per  platform  of 
the  unit  obey,  for  small  A t 

W(t  +  h)  —  W (t)  =  number  of  salvos  fired  in  [t,  t  +  At].  (1.19) 
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1.9.1  Uncoordinated  Target  Selection 

For  both  the  independent  target  acquisition  case  (MNCO,MARI),  and  the  linearly  dependent  case 
(MNCO,MARD),  the  firing  rate  for  a  Blue  platform  will  be  aB(XR)aB vB .  The  expected  number  of 
salvos  fired  per  Blue  platform  in  [t,t  +  At]  will  be  crB(r}R)aBvB At.  Taking  expected  values  in  (1.19), 
dividing  by  At  and  taking  the  limit  At  — >  0  yields 

CB(t)  =  -aB(VR(t))aBuB(t).  (1.20) 

A  similar  equation  holds  for  the  Red  unit. 


1.9.2  Coordinated  Target  Selection 

For  the  case  without  wrap-around  (MWCO,MRTI,MNWA),  the  assumption  SWES  may  not  hold 
for  a  few  rounds.  In  the  long  run,  if  the  platforms  which  are  selected  to  fire  on  the  enemy  are  rotated 
regularly  between  rounds,  one  can  still  employ  this  assumption.  In  that  case,  the  Blue  unit  will  fire,  on 
average, 


mm(r]B,r]R)  B 
- 1— - -  v  —  mm 


(py 


shots  per  platform,  in  each  round.  Then,  the  weapons  dynamics  will  be 


(t)  —  ~~p  min 


rfjt)  A 

vB(ty  ) 


(1.21) 


For  the  case  with  wrap-around  (MWCO,MRTI,MWWA),  the  SWES  is  more  reasonable.  Since 
all  Blue  platforms  will  have  target  assignments,  the  expected  number  of  shots  per  platform  in  the  unit  is 
vB ,  per  round.  Therefore  the  weapons  dynamics  are 

c  B(t)^-puB(t).  (1.22) 


1.10  Experiment  Results  and  Analysis 


Table  1.2:  Scenarios  Used  For  Experiments 


Scenario  Types;  NB  =  8,  PB  =0.8 

A 

to 

II 

II 

B 

Nb  =  2Nr,  PfcB  =  p£ 

c 

NB  =  Nr,  Pf  =  2Pr 

To  compare  each  of  the  models  we  simulate  each  of  the  above  scenarios.  We  note  that  each  model 
shows  the  principle  of  force  concentration.  That  is, 

•  When  both  forces  have  equal  loss  rates  and  equal  number  of  platforms,  the  dynamics  for  each  force 
are  exactly  equal, 

•  When  one  force  dominates  in  numbers  over  the  other  force,  the  dominating  force  has  fewer  casualties. 

There  are  three  types  of  scenarios  that  are  summarized  in  Table  1.2.  For  both  uncoordinated  and 
coordinated  target  selection,  we  use  ip  =  1  (the  units  do  not  move)  and  v  —  1  (the  units  always  fire  at 
an  acquired  target).  Also,  for  MNCO  we  have  a  —  0.1  and  for  MWCO  we  have  p  =  0.5.  Under  the 
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MARD  assumption  for  uncoordinated  target  selection,  the  acquisition  rate  has  a  linear  dependence  on 
the  number  of  enemy  platforms.  That  is, 

<jb  =  Nr  and  aB  =  NB . 

Also,  we  assume  that  there  are  enough  weapons  on  each  platform  to  last  through  the  simulation.  We  do 
not  worry  about  weapon  expenditure  for  these  experiments. 

To  compare  the  models,  we  must  first  understand  how  each  model  behaves  in  a  realistic  battlefield 
(with  our  given  scenario). 

Uncoordinated  (MARI)  Independent  acquisition  rate  implies  a  =  1.  Whenever  a  platform 
acquires  a  target,  the  platform  will  fire  one  salvo  at  the  target  with  a  probability  of  kill,  Pfc. 
Whenever  a  target  is  destroyed,  the  platform  will  reacquire  a  new  target.  This  will  go  on  until  the 
simulation  ends,  or  until  the  platform  is  destroyed. 

Uncoordinated  (MARD)  The  dynamics  are  the  same  as  MARI  except  for  how  a  platform 
acquires  a  target.  That  is,  aB(m)  =  jjrk  and  aR(n)  =  -fa,  where  m  is  the  number  of  Red 
platforms  and  n  is  the  number  of  Blue  platforms  at  some  time  t.  When  targets  are  sparse,  the 
acquisition  rate  is  smaller. 

Coordinated  (MNWA)  At  the  beginning  of  each  round,  the  maximum  number  of  platforms 
targeted  by  the  enemy  is  equal  to  the  number  of  platforms  in  the  smallest  unit.  For  our  scenarios, 
there  will  be  min  (JVB,  NR)  Red  targets  for  Blue  and  min  (NB,NR)  Blue  targets  for  Red  when  the 
simulation  begins.  When  a  round  begins,  both  sides  fire  simultaneously. 

Coordinated  (MWWA)  For  the  simulation  we  have  targeting  as  defined  in  Section  1.7.2.  The 
commander  will  assign  unique  targets  for  each  his  platforms  until  all  targets  have  been  targeted, 
then  the  remaining  platforms  will  start  targeting  from  the  beginning  of  the  list  of  targets  until  all 
platforms  have  an  enemy  target.  When  a  round  begins,  both  sides  fire  simultaneously. 

Figures  1.1,  1.3  and  1.5  give  the  plots  of  the  probability  distribution  of  certain  simulation  times 
for  Uncoordinated  Target  Selection.  In  each  scenario  we  can  see  that  the  probability  distribution  for 
MARD  cases  show  lower  covariances  over  the  simulation  then  the  MARI  cases.  This  is  expected  since 
the  independent  acquisition  rate  case  allows  for  more  acquisitions  per  unit  time  then  the  dependent  case 
when  targets  become  sparse. 

Next  we  shift  our  attention  to  the  coordinated  target  selection  models.  From  Figures  1.2,  1.4  and  1.6 
wee  see  less  variation  in  the  distribution  for  MWWA  cases  than  in  the  MNWA  cases.  By  understanding 
the  differences  between  the  models  we  can  see  why  this  happens.  Since  the  'wrap-around’  case  allows  for 
targets  to  be  selected  by  more  than  one  enemy  platform,  we  can  understand  that  the  dominating  force 
will  obtain  better  chances  of  winning  than  the  weaker  force.  With  ‘no  wrap-around’,  the  force  that  wins 
is  the  force  with  better  probabilities  of  killing  the  enemy  target. 

There  are  definite  differences  between  the  distributions  of  uncoordinated  and  coordinated  target  se¬ 
lection  models.  In  the  uncoordinated  target  selection  case,  at  any  given  time,  only  one  platform  can  be 
destroyed.  On  the  other  hand,  in  the  coordinated  target  selection  model,  after  a  round,  a  whole  unit  has 
a  non-zero  probability  of  being  destroyed. 

After  seeing  how  the  probability  distribution  evolves  for  each  model,  we  need  to  examine  the  expected 
values  of  the  number  of  platforms  for  both  sides  and  compare  them  with  the  approximated  expected 
values.  We  want  to  examine  how  much  of  an  outlook  of  the  battle  dynamics  we  can  see  with  the 
approximated  expected  values.  The  derivation  of  the  approximate  expected  values  does  not  guarantee 
any  reliable  prediction  over  a  long  horizon. 

Tables  1.3  and  1.4  summarize  the  covariances  and  the  approximation  error  in  the  expected  values 
using  the  L 2  norm  for  the  duration  of  the  engagement,  for  all  models  and  scenarios. 

In  Figures  1.7-1.12  we  have  the  expected  values  (with  covariances)  for  the  uncoordinated  target 
selection  case.  There  are,  of  course,  small  differences  in  the  exact  values  from  MARI  to  MARD.  Yet 
we  see  that  the  approximation  for  MARD  is  much  better  since  it  has  an  overall  better  prediction  of 
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(a)  (b) 


Figure  1.1:  Evolution  of  the  probability  distribution  for  Uncoordinated  Target  Selection  for  Scenario 
A((a):MARI  and  (b):MARD). 


(a)  (b) 


Figure  1.2:  Evolution  of  the  probability  distribution  for  Coordinated  Target  Selection  for  Scenario  A( 
(a);No  Wrap-Around  and  (b): Wrap- Around) 
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(a)  (b) 


Figure  1.3:  Evolution  of  the  probability  distribution  for  Uncoordinated  Target  Selection  for  Scenario 
B((a):MARI  and  (b):MARD). 


Figure  1.4:  Evolution  of  the  probability  distribution  for  Coordinated  Target  Selection  for  Scenario  B( 
(a):No  Wrap-Around  and  (b): Wrap- Around) 
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Figure  1.5:  Evolution  of  the  probability  distribution  for  Uncoordinated  Target  Selection  for  Scenario 
C((a):MARI  and  (b):MARD). 
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Figure  1.6:  Evolution  of  the  probability  distribution  for  Coordinated  Target  Selection  for  Scenario  C( 
(a):No  Wrap-Around  and  (b): Wrap- Around) 
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Table  1.3:  Z^-Nonn  of  Covariances 


Exp.  # 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11  j  12 

MARI 

MARD 

MNWA 

MWWA 

Scenario 

A 

B 

c 

A 

B 

c 

A 

B 

C 

A 

B 

c 

aB2 

10.16 

4.09 

4.90 

5.33 

3.19 

3.46 

20.83 

8.79 

10.54 

19.18 

5.31 

10.16 

a*2 

10.16 

3.85 

9.81 

5.33 

2.80 

5.26 

20.83 

8.36 

21.67 

19.18 

5.79 

20.12 

crBaR 

5.82 

1.65 

2.93 

2.09 

1.05 

1.30 

18.42 

7.70 

10.15 

16.80 

4.74 

9.28 

Table  1.4:  Z/2-Norm  of  Error  Between  Actual  and  Approximate  Expected  Values 


Exp.  # 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

MARI 

MARD 

MNWA 

MWWA 

\\VB  -i)B\\ 

0.14 

0.37 

0.01 

0.06 

0.03 

0.02 

1.35 

5.80 

0.02 

1.53 

4.74 

0.16 

hR-vR\\ 

0.14 

1.35 

0.25 

0.06 

0.06 

0.04 

1.35 

5.80 

0.04 

1.53 

0.06 

0.58 

hB-vB 

+vR-vR\\ 

0.28 

1.72 

0.26 

0.12 

0.09 

0.06 

2.70 

11.60 

0.06 

3.06 

4.80 

0.74 

the  outcome  over  the  simulation.  It  should  be  noted  that  the  approximation  is  not  necessarily  always 
this  good,  yet  over  other  simulations,  the  approximation  for  MARD  is  better  than  MARI.  A  reason 
for  this  can  be  associated  with  the  acquisition  rate.  In  the  MARI  case,  platforms  acquire  targets  at  a 
constant  rate  independent  of  the  number  of  enemy  platforms.  In  the  MARD  case,  platforms  acquire 
targets  at  rate  dependent  on  the  number  of  enemy  platforms.  This  means  as  targets  become  more  sparse, 
the  dynamics  of  the  dependent  case  will  “slow  down” .  That  is,  the  amount  of  targets  being  acquired  will 
decrease  as  the  targets  become  more  sparse.  This  allows  for  an  approximation  to  remain  good  as  targets 
become  more  sparse. 

Lastly,  Figures  1.13-1.18  illustrate  the  expected  values  (with  covariances)  for  the  coordinated  target 
selection  case.  Here  we  do  not  see  much  difference  between  the  actual  expected  values,  yet  one  notable 
difference  is  in  the  covariances.  The  covariances  for  the  ‘wrap-around’  case  are  smaller  at  the  end  of 
the  simulation  time  then  the  ‘no  wrap-around’  case.  We  notice  through  other  simulations,  that  over  a 
significant  amount  of  simulation  time  the  ‘wrap-around’  case  has  better  covariances. 


1.11  Conclusions  and  Recommendations 

These  results  indicate  that,  the  ODE  models  were  good  approximations  under  the  uncoordinated  target 
selection  assumption,  and  were  found  to  be  sufficient  to  represent  the  attrition  dynamics  in  a  differential 
game  setup  in  this  case.  The  discrepancy  between  the  MC  and  ODE  models  increase  as  the  engagement 
proceeds,  which  should  be  expected.  Under  the  coordinated  target  selection  assumption,  the  ODE  ap¬ 
proximations  were  worse.  This  can  be  partially  explained  by  the  fact  that  coordination  implies  firing  in 
rounds,  and  therefore  platform  loss  is  more  discrete  in  nature. 

An  extension  of  the  above  modeling  procedure  to  forces  with  multiple  units  is  not  immediate.  There 
are  at  least  two  issues  that  need  to  be  addressed: 
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•  A  unit  may  engage  in  a  firelight  with  more  than  one  enemy  unit  at  a  given  time.  In  this  case,  the 
fire  intensity  command  for  this  unit  must  be  replaced  by  a  vector  of  commands,  with  length  equal 
to  the  number  of  enemy  units.  From  a  modeling  perspective  this  may  be  acceptable.  However,  this 
“proliferation  of  inputs”  is  quite  undesirable  for  control  design. 

•  The  state-space  of  the  resulting  Markov  Chain  will  grow  geometrically  with  the  number  of  units. 
As  a  result,  verification  of  approximation  of  expected  values  via  simulation  may  become  infeasible. 

A  multi-unit  model,  based  on  heuristic  arguments,  is  included  as  an  appendix  to  this  chapter. 

As  mentioned  in  Section  1.7,  round  arrival  may  depend  on  platform  loss  (MWCO,  MRTK).  In  this 
case,  one  needs  to  consider  a  random  variable  for  the  time  it  takes  one  platform  to  kill  its  assigned  target. 
In  most  cases,  the  distribution  of  this  variable  will  not  be  exponential  and  the  resulting  stochastic  process 
may  not  be  a  Markov  Chain.  Further  investigation  of  this  topic,  with  several  likely  distributions  for  the 
kill  time,  would  be  interesting. 

It  is  also  of  interest  to  model  attrition  when  fighting  units  have  different  rules  about  target  selection 
and  coordination.  For  instance,  the  platforms  of  one  unit  may  be  coordinated  (and  hence  will  fire  in 
rounds),  while  the  platforms  of  the  enemy  unit  are  uncoordinated  (and  hence  will  fire  continuously  as 
targets  are  acquired). 

Another  research  direction  would  be  to  develop  better  approximations  for  the  expected  values  by 
employing  model  reduction  tools,  such  as  singular  perturbation  analysis  or  principal  component  analysis. 
Whether  more  sophisticated  mathematics  will  result  in  better  overall  predictive  capability  will,  no  doubt, 
depend  on  the  nature  of  assumptions  involved. 
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Covariance  Expectation  (Red)  Expectation  (Blue) 


Figure  1.7:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Uncoordi¬ 
nated  Target  Selection  (MARI,  Experiment  1.1) 


Time  (min) 


Figure  1.8:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Uncoordi¬ 
nated  Target  Selection  (MARI,  Experiment  1.2) 
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Time  (min) 


Figure  1.9:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Uncoordi¬ 
nated  Target  Selection  (MARI,  Experiment  L3) 


28 


Figure  1.10:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Uncoordi¬ 
nated  Target  Selection  (MARD,  Experiment  1.4) 
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Figure  1.11:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Uncoordi¬ 
nated  Target  Selection  (MARD,  Experiment  1.5) 
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Figure  1.12:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Uncoordi¬ 
nated  Target  Selection  (MARD,  Experiment  1.6) 


31 


Time  (min) 


Figure  1.14:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Coordinated 
Target  Selection  (MNWA,  Experiment  1.8) 
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Figure  1.16:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Coordinated 
Target  Selection  (MWWA,  Experiment  1.10) 
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Tim©  (min) 


Figure  1.17:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Coordinated 
Target  Selection  (MWWA,  Experiment  1.11) 
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Covariance  Expectation  (Red)  Expectation  (Blue) 


Figure  L18:  Evolution  of  the  expected  values  (actual  and  approximated)  and  covariances  for  Coordinated 
Target  Selection  (MWWA,  Experiment  1.12) 


Figure  1.19:  Comparison  of  Expected  Values  for  Experiment 
min(7)B,r7H)  and  e** . 


1.8  with  approximated  ODE’s  using 
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1.12  Appendix:  Mission  Dynamics  Continuous-Time  Model  3.0 

1.12.1  Modeling  Assumptions 

This  section  describes  the  Mission  Dynamics  Continuous-time  Model  version  3.0,  (MDCM  3.0),  which 
is  a  multi- unit  extension  of  the  ODE  model  which  approximates  the  expected  values  of  the  number  of 
platforms,  for  the  uncoordinated  target  selection,  independent  acquisition  rate  case. 

As  usual,  we  will  denote  the  quantities  which  belong  to  the  forces  Blue  and  Red  with  superscripts 
.B  and  R.  If  the  symbols  in  an  expression  do  not  have  superscripts,  this  is  intended  to  mean  that  the 
expression  is  valid  for  either  force. 

Consider  two  (homogeneous)  units  of  the  opposing  forces  engaged  in  a  fire  fight,  in  which  platforms 
of  both  sides  are  shooting  at  each  other  simultaneously.  During  this  engagement,  each  platform  searches 
for  an  enemy  platform  (in  the  sky,  on  the  ground,  etc.).  When  a  platform  is  located  in  space,  identified 
to  be  an  enemy  and  the  weapon  system  is  locked  on  to  this  platform,  a  target  is  said  to  be  acquired. 

The  state  variables  for  each  unit  are  its  position  (eM2,  the  expected  value  of  the  number  of  platforms 
in  the  unit  77  >  0  and  the  expected  value  of  the  number  of  salvo  loads  of  weapons  per  platform  (  >  0. 

In  addition  to  the  speed  controls  \x  6  [—  1,  l]2,  each  unit  has  a  fire  intensity  control  it  6  [0, 1].  One 
interpretation  is  as  follows:  Suppose  an  object  is  located  and  identified  as  a  target  by  our  detection 
device.  This  may  be  a  false  detection  due  to  the  enemy’s  electronic  warfare  or  the  object  may  in  fact 
be  a  decoy.  Then  7 r  reflects  the  mission  commander’s  confidence  in  the  detection  device  under  the  given 
circumstances  and  allows  provident  use  of  weapons.  In  this  way,  7 r  becomes  the  frequency  of  firing  a  salvo 
per  target  acquisition  (i.e.,  the  probability  of  firing,  given  a  target  is  acquired). 

Consider  the  Blue  unit  firing  on  the  Red  unit.  Let  the  probability  of  kill  for  each  weapon,  given  it 
is  fired,  be  pj  wkill0  |  fired0 1.  The  probability  of  killing  the  target,  with  a  salvo  of  sB  weapons,  given 
that  they  are  fired  simultaneously,  is 


ndef  r  Bio  n  1  f  [1  -(1  -Pjwkill®  I  fired3  j)sBl  ifC®(t)>0, 

P®  =  P{  kill®  fired®  \  =  H  V  l  1  V  J  S  W  (1.23 

*•  '  \o  if<®(t)=0. 

The  following  were  assumed  to  hold  when  deriving  the  Markov  chain  model  in  [1]: 

MNCO  (No  Coordination)  Friendly  platforms  do  not  communicate  for  target  selection  (uncoor¬ 
dinated  selection).  The  exception  is  when  a  platform,  which  has  depleted  its  supply  of  weapons, 
locates  and  identifies  a  target.  In  that  case,  this  platform  will  relay  this  information  to  a  friendly 
platform  in  the  same  unit  which  still  has  weapons. 

MARI  (Acquisition  Rate  Independent):  Target  acquisition  rate  does  not  depend  on  the  number 
of  enemy  platforms  or  their  distribution  in  space.  (Search  devices  are  advanced  enough  that  they 
will  locate  enemy  platforms  efficiently  even  when  they  are  distributed  sparsely.) 

MPKD  (Probability  of  Kill  depends  on  Distance)  Probability  of  killing  the  target  depends  on  the 
distance  between  the  units. 

MNWT  (Negligible  Weapon  reach  Time)  The  time  it  takes  for  a  missile,  bomb,  or  other  weapon 
to  reach  its  target  is  negligible. 

MNSA  (No  Self- Attrition):  Self-attrition  or  equipment  breakdowns  are  negligible. 

From  MPKD,  (1.23)  will  depend  on  the  distance  between  units,  with  a  function  ip  :  R  — >  [0, 1]  which 
depends  on  the  positions  of  each  unit,  ipB(\\(,B  —  £0||).  Let  us  take  this  function  as  ip(r)  =  exp(— (r-/ro)2), 
where  ro  is  a  parameter  which  depends  on  the  type  of  units.  Therefore  the  probability  of  a  platform 
killing  its  target,  given  it  is  assigned  to  a  target,  is  Pipit. 
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1.12.2  State  Equations 

Although  the  models  in  [1]  were  derived  for  a  one-on-one  engagement,  we  will  assume  that  extension  to 
multiple  unit  case  can  be  obtained  simply  by  considering  multiple,  simultaneous,  independent  one-on-one 
engagements.  As  usual,  we  will  use  subscripts  to  index  units.  The  first  subscript  indicates  the  “shooter” 
and  the  second  one  indicates  the  “shootee” .  The  maximum  speeds  for  unit  z,  in  both  x  and  y  directions 
is  a{.  Denote  by  pij  the  rate  at  which  a  platform  in  unit  i  acquires  a  platform  in  unit  j  as  a  target. 
We  include  the  possibility  that  a  unit  may  fire  at  more  than  one  enemy  unit,  and  it  may  be  fired  upon 
by  more  than  one  enemy  unit  at  the  same  time.  However,  these  target  assignments  are  fixed  during 
the  mission.  The  indices  of  the  Red  units  that  Blue  unit  z  is  shooting  at  (“the  shootees  of  Blue  z”)  are 
denoted  by  ff  (z).  Thus,  the  firing  intensity  control  for  this  Blue  unit  has  |/B(z)|  components,  where  |  *  | 
denotes  the  cardinality  of  a  set.  The  kth  component  of  the  firing  intensity  of  Blue  unit  z,  *5  e  [0,11, 
corresponds  to  its  fire  against  the  7th  Red  unit,  in  which  j  is  the  A;th  element  of  /B(z).  The  indices  of  the 
Red  units  which  are  shooting  at  Blue  unit  z  (“the  shooters  against  Blue  z”)  are  denoted  by  /B(z). 

Here,  only  the  dynamics  for  Blue  unit  z  will  be  displayed.  The  corresponding  equations  for  a  Red  unit 
can  be  obtained  using  the  symmetry  between  the  forces,  by  interchanging  B  with  R .  The  motion  on  the 
plane  is  given  by 

=  afVfiW- 
=  “?**&(*)• 

The  platform  loss  evolves  as 

-^{t)  =  -  £  ^W^^(HCfW-ef(*)ll)irffc(t),  (1-26) 

where  z  is  the  kth  element  of  /B(j).  The  weapon  expenditure  is  given  by 

-TrCfW  =  -  sign  {rifit))  ,  (1.27) 

fc  =  l 

where  j  is  the  kih  element  of 


(1.24) 

(1.25) 
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Chapter  2 


Experiment  2:  Controller 
Performance  Comparison  with  Other 
Controllers 

2.1  Executive  Summary 

This  is  experiment  for  hypothesis  two.  Both  the  plant  and  internal  models  are  the  same,  i.e.,  the 
Mission  Dynamics  Continuous-time  Model  (MDCM).  There  is  no  noise  added  to  the  state  variables  when 
constructing  the  observed  state  variables  (the  output  variables).  The  control  actions  of  the  Blue  and 
Red  teams  are  generated  by  one  of  the  following  strategies:  the  proposed  game  theoretic  algorithm,  a 
simple  heuristic  stochastic  strategy  (e.g.  a  movement  bias  is  given  toward  targets),  a  simple  heuristic 
deterministic  strategy,  and  a  human  planner. 

The  strategy  adopted  by  Blue  and  Red  is  optimal  with  sense  of  a  Nash  equilibrium  with  respect  to 
the  value  function;  that  is,  it  maximizes  the  value  function  with  respect  to  Red  and  minimizes  it  with 
respect  to  Blue. 


2.2  Experiment  Scope 

We  did  a  series  of  experiments  to  evaluate  the  effectiveness  of  the  current  differential  game  technology 
as  a  means  of  countering  the  enemy  actions  under  idealized  situations  with  perfect  information  about 
enemy  states,  initial  conditions  and  objectives. 

For  the  experiments  we  considered  two  different  one  vs.  one  scenarios  and  four  different  cases  correspond¬ 
ing  to  each  scenario.  In  all  cases  the  strategy  for  the  blue  player  was  determined  by  a  game  theoretic 
controller,  with  running  cost  on  the  velocity  controls  and  the  distance  to  the  target,  and  terminal  cost 
on  the  final  number  of  platforms. 

Case  1  (for  each  of  the  two  scenarios)  represents  the  baseline  for  comparison,  as  in  this  case  the  actions 
for  red  were  also  determined  using  a  game  theoretic  controller. 

For  Case  2,  an  open  loop,  heuristic-deterministic  strategy  was  adopted  for  the  red  player,  the  idea  behind 
it  summarized  as  follows:  reach  the  target  for  the  red  units  while  avoiding  confrontation  by  taking  an 
indirect  route  towards  the  target,  such  that  any  possible  encounter  with  the  blue  units  will  occur  as  far 
as  possible  from  the  target  of  the  blue  units.  This  strategy  is  consistent  with  the  weights  on  the  payoff 
function  selected  for  the  blue  player,  except  that  no  effort  made  to  minimize  the  control  effort. 

For  Case  3,  a  heuristic-stochastic  strategy  was  adopted  for  the  red  player.  Basically,  a  direct  route  was 
plotted  for  the  red  units  to  follow  towards  their  target,  with  random  shifts  in  direction,  velocity  and  firing 
intensity  introduced  at  several  intervals  along  the  route. 

Finally,  for  Case  4  a  human  planner  assumed  the  control  of  the  red  units,  trying  to  accomplish  the  same 
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Table  2.1.  List  of  Scenarios 


Blue 

Red 

Case  1  (cross  1,  joust  1) 

Game  theoretic  controller 

Game  theoretic  controller 

Case  2  (cross  2,  joust  2) 

Game  theoretic  controller 

Deterministic  Heuristic  controller 

Case  3  (cross  3,  joust  3) 

Game  theoretic  controller 

Stochastic  Heuristic  controller 

Case  4  (cross  4,  joust  4) 

Game  theoretic  controller 

Human-being  planner 

B1 :  interceptor,  R1 :  bomber 

100  r 

90- 


Figure  2.1:  Cross  1:  Trajectories 


objectives  as  before. 


2.3  Experiment  Results 

2.3.1  Scenario  One:  Cross 
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Figure  2.2;  Cross  1:  Firing  Intensities 


B1:b!u,  R1:red 
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Figure  2.3:  Cross  1;  Velocities 
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Figure  2.4:  Cross  1:  Number  of  Platforms 


8 1  interceptor,  R 1 :  bomber 


Figure  2.5:  Cross  2:  Trajectories 
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Figure  2.8:  Cross  2:  Number  of  Platforms 


B1:interceptor,R1  :bomber 
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Figure  2.9:  Cross  3:  Trajectories 
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B1:Wu,R1:red 


Figure  2.12:  Cross  3:  Number  of  Platforms 


B 1 :  interceptor,  R  t  :bomber 

lOOr 

90- 


Figure  2.13:  Cross  4:  Trajectories 
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B1  biu,  R1:red 
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Figure  2.15:  Cross  4:  Velocities 
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Figure  2.16:  Cross  4:  Number  of  Platforms 


2.3.2  Scenario  Two:  Joust 
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100, 
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Figure  2.17:  Joust  1:  Trajectories 


B1:b)u,  R1:f0d 


Figure  2.18:  Joust  1:  Firing  Intensities 
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R1:hlyi  R1 
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Figure  2.19:  Joust  1:  Velocities 


Figure  2.20:  Joust  1:  Number  of  Platforms 


8 1 :  interceptor ,  R 1 :  bomber 
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Figure  2.21:  Joust  2:  Trajectories 


Blrbhi,  R1:red 
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Figure  2.22:  Joust  2:  Firing  Intensities 
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B1  :interceptor,R1  bomber 


Figure  2.25:  Joust  3:  Trajectories 


B1:bfu,  Rlired 


Figure  2.26:  Joust  3:  Firing  Intensities 
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B1 :  interceptor,  R1  :bomber 


Time: 
20.0  min 


10  20  30  40  50  60  70  60  90  100 

Figure  2.29:  Joust  4:  Trajectories 


B1:bJu,  R1:red 
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Figure  2.30:  Joust  4:  Firing  Intensities 
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Table  2.2,  Scenario  One  -  Cross  (Parameters  Value) 


Initial  Conditions 

Weights 

Platform 

*0 

Y0 

Speed 

Distance 

Final  Platform 

Blue 

10 

80 

50 

200 

0.1 

0.2 

Red 

10 

50 

20 

200 

0.1 

20 

Table  2.3:  Scenario  Two  -  Joust  (Parameters) 


Initial  Conditions 

Weights  | 

Platform 

*0 

Vo 

Speed 

Distance 

Final  Platform 

Blue 

10 

80 

50 

150 

0.1 

0.2 

Red 

10 

20 

50 

150 

0.1 

20 

2.4  Conclusions 

Tables  two  and  four  show  that  using  different  control  strategies,  other  than  the  one  based  on  differential 
game  theory,  it  was  possible  to  improve  the  performance  of  the  red  player  with  respect  to  the  number  of 
platform  losses  and  final  distance  to  the  target. 

Tables  three  and  five  show  that  the  total  cost  that  the  red  player  was  trying  to  maximize,  according 
to  the  game  theoretic  controller,  was  indeed  maximum  in  Case  1,  when  the  game  theoretic  controller 
was  used  to  determine  the  strategy  of  both  players.  Furthermore,  the  total  value  of  the  game  (red  minus 
blue)  was  also  a  maximum  for  Case  1. 
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Table  2.4:  Experiment  Results  for  Scenario  one  -  Parameter  Values 


BLUE 

Red 

Platform 

Xf 

Yf 

Plat  lost 

Distance 

Platform 

Xf 

Yf 

Plat  lost 

Distance 

cross  1 

7.3 

64.8 

47.5 

2.7 

15.4 

4.0 

47.5 

30.1 

6.0 

10.0 

cross2 

7.5 

49.9 

60.7 

2.5 

32.0 

7.8 

50.1 

19.8 

2.2 

2.0 

cross3 

5.9 

63.4 

49.4 

4.1 

16.6 

6.9 

49.4 

19.9 

3.1 

6.1 

cross4 

6.6 

71.8 

40.3 

3.4 

12.7 

7.0 

49.8 

19.6 

3.0 

4.5 

Table  2.5:  Experiment  Results  for  Scenario  One  -  Cost  Components 


BLUE 

Red 

Game  Value 

Speed 

Distance 

Final  Plat 

Total 

Speed 

Distance 

Final  Plat 

Total 

Game  Value 

crossl 

819 

7927 

-11 

8735 

-898 

-2110 

323 

-2685 

6050 

cross2 

1721 

6081 

-11 

7791 

-5346 

-7 

1219 

-5514 

2277 

cross3 

1033 

7798 

-7 

8824 

-2976 

-2027 

940 

-4063 

4761 

cross4 

1887 

8111 

-9 

9989 

-3078 

-2508 

975 

-4611 

5378 

Table  2.6:  Experiment  Results  for  Scenario  Two  -  Parameter  Values 


BLUE 

RED 

Platform 

Xf 

Yf 

Plat  lost 

Distance 

Plat 

Xf 

Yf 

Plat  lost 

Distance 

joust  1 

8.4 

68.9 

59.9 

1.6 

14.9 

5.9 

26.8 

61.9 

4.1 

13.7 

joust2 

6.3 

66.3 

33.7 

3.7 

21.3 

6.5 

19.8 

49.7 

3.5 

3.6 

joust3 

6.2 

67.5 

48.7 

3.8 

12.6 

6.0 

24.9 

63.1 

4.0 

14.0 

joust4 

7.8 

69.4 

49.7 

2.2 

10.6 

7.2 

21.8 

49.3 

2.8 

19.3 

Table  2.7:  Experiment  Results  for  Scenario  Two  -  Cost  Components 


BLUE 

Red 

Game  Value 

Speed 

Distance 

Final  Plat 

Total 

Speed 

Distance 

Final  Plat 

Total 

Game  Value 

joust  1 

1360 

7794 

-14 

9140 

-1453 

-2183 

693 

-2943 

6197 

joust  2 

1727 

7473 

-8 

9192 

-3273 

-757 

837 

-3192 

6000 

joust  3 

1761 

8038 

-8 

9791 

-2538 

-1936 

724 

-3750 

6041 

joust4 

885 

8068 

-12 

8941 

-1673 

-2720 

1041 

-3352 

5589 
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2.6  Appendix 

2.6.1  Scenario  File  for  Cross 

weight  on  red's  terminal 
-0.20  0  0  0  20  0]); 

40 
60 


X  This  scenario  file  is  for  experiment  l.Put  different 
X  number  of  platforms,  7th  element  in  g_Qf  =  diag([0  0 

7. 

l 

70dR(l)=20  or  40  or  60. 


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

7,  UNIT  PROPERTIES 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 


X  numbers  of  categories 
un.NBc=5;  un.NRc=5; 

7.  each  category  is  represented  by  an  integer, 

X  which  will  used  for  indexing  into  parameter  matrices 
%  1: ground  troops  2: air  defense  3: bombers 

X  4: interceptors  5:SEAD 

X  numbers  of  units 

NB_u=l;  NR_u=l ;  un.NBu=NB_u;  un . NRu=NR_u ; 

X  categories  of  units  for  each  force 

X  Blue  1  is  a  bomber  unit.  Blue  2  is  an  interceptor  unit 
X  Red  1  is  a  ground  troop  unit.  Red  2  is  an  interceptor  unit 
un .  cB=  [4]  ;  un .  cR=  [4]  ; 

X  descriptive  names  of  units 

un.nameB  -  {'fighter'};  un.nameR  =  {'interceptor'}; 

X  Assume  that  one  unit  can  attack  only  one  enemy  unit, 

X  but  one  unit  can  be  attacked  by  multiple  units 
%  In  this  example,  B1  shoots  at  Rl,  B2  shoots  at  R2 
7*  Rl  shoots  at  Bl,  R2  shoots  at  B2 


X  matrix  form 
un.fB=[l];  un.fR=[i]; 

7.  from  the  shooter's  perspective: 

X  Blue  unit  i  shoots  at  all  the  Red  units  with  indices  un.fsB{i} 
un.fsB={l};  un.fsR={l}; 

7.  from  the  shootee's  perspective: 

7#  Blue  unit  i  is  shooted  at  by  all  the  Red  units  with  indices 
Xun. feB{i} 

un.feB={l};  un.feR={l}; 

X  state  vectors  for  each  force 

X  xB=[xiB(l , 1) ;  xiB(l,2);  etaB(l);  zetaB(l);  xiB(2,l);  xiB(2,2); 
7o  etaB(2);  zetaB(2)] 

7#  state  vector  for  the  whole  system 
X  xsys=  [xB ;  xR] 

X  initial  conditions  for  states 

7»  [xiB(l ,  1)  ;  xiB(l  ,2) ;  etaB(l) ;  zetaB(l) ;  xiB(2, 1)  ; 

7*xiB(2,2) ;  etaB(2);  zetaB(2)] 
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xiB(l,l)=20;  xiB(l ,2)=50 ;  etaB(l)=10;  zetaB(l)=10;  xiE(l,l)=50; 
xiR(l,2)=80;  etaR(l)=10;  zetaR(l)=10;  xB_init= [xiB(l , : ) ’ ; 
etaB(l);  zetaB(l)] ;  xR_init=[xiR(l , : ) ’ ;  etaR(l);  zetaR(l)] ; 
x_init=[xB_init;  xR_in.it]; 

7»  control  vector  for  a  Blue  unit  i 


7*  uB_i=[muB_ix;  muB_iy;  piB_i  ] 

%  control  vector  for  the  Blue  force 
7#  uB=[uB(l);  uB(2)  ;  .  ..;  uB.NBu  ]; 

7,  control  vector  for  the  whole  system 
7«  usys=[uB  ;  uR] 

7#  parameters  for  plant  simulation  blocks,  do  not  edit 
7#  control  constraints 

numinputs  =  3*(un.NBu+un.NRu) ;  numstates  =  4*(un.NBu+un.NRu) 
contr_uplim  =  ones(l , numinputs) ;  contr„lolim  =  []  ;  for 
i=l : un . NBu+un . NRu , 

contr^lolim  -  [contr_lolim,  -1,  -“1,  0]; 
end; 


mamma 


%  CONSTANT  PARAMETERS  (WHICH  DEPEND  ON  THE  SCENARIO) 


%  maximum  speeds,  km/min 

pm . alphaB=  [0 . 5 ;  0;  10;  10;  10];  pm . alphaR= [0 . 5 ;  0;  10;  10;  10]; 

7,  parameter  in  probability  of  engagement  function 

pm . sigmaB=ones (un . NBc , un . NRc) ;  pm . sigmaR=ones (un . NRc ,un . NBc) ; 

7.  Example:  sigma  for  Blue  unit  i  against  Red  unit  j  is 
70  pm.  sigmaB(un.  cB(i)  ,un.  cR(j) ) 

7,  modification  factor  for  number  of  engagements 

pm.betaengB=ones(un.NBc,un.NRc) ;  pm.betaengR=ones(un.NRc,un.NBc) 
7.  modification  factor  for  prob  kill  of  a  weapon 

pm.betawepB=ones(un.NBc,un.NRc) ;  pm.betawepR=ones(un.NRc ,un.NBc) 
7*  prob  kill  of  weapon  type  i  against  platform  type  j 
pm . pkillB=0 . 8*ones (un . NBc , un . NRc) ; 
pm.pkillR^O. 8*ones(un.NRc ,un.NBc) ; 

7#  salvo  size  of  platform  type  i  shooting  at  platform  type  j 
pm.salvoB=ones(un.NBc ,un.NRc) ;  pm. salvoR=ones(un.NRc,un.NBc) ; 
l  parameter  of  the  distance  factor  (varphi)  function 
pm.rzeroB=5*ones(un.NBc,un.NRc) ;  pm.rzeroR=5*ones(un.NRc,un.NBc) 


ammmmmmmmmmmmmmmmm 


% 

vx 


TERRAIN  INFORMATION 

immmfflmfflmmmmrammmiimim 


7o  the  rectangular  zone  for  the  mission 

7»  coordinates  of  lower  left  and  upper  right  corners,  in  km 
7,  zone_lim=  [  xmin  xmax; 

70  ymin  ymax  ] 

7#tr  .zone_lim=  [  0  100; 

7o  0  100]; 

tr . zone_lim=  [  [0 ; 0] ,  [100 ; 100] ] ; 

70  obstacle  locations,  radii  and  names 
tr.NBo=0;  tr.NRo~0; 

tr . obsB ( 1)  . loc=  [0  0];  7.  x  and  y,  in  km 

tr.obsB(l) .rad=0;  tr.obsB(l) .name=J 1 ; 
tr.obsR(l)  .loc=[0  0];  7.  x  and  y,  in  km 
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tr.obsR(l) .rad=0;  tr.obsR(l) .name=7 7 ; 

X  fixed  target  locations,  sizes  and  names 

tr . NBt=l ;  tr.NRt=l;  qB(l,l)=80;  qB(l,2)=50; 

tr . tarB(l) . loc=[qB(l , 1)  ,  qB(l,2)];  7,  x  and  y,  in  km 

tr . tarB(l) . size=0;  X  not  used  in  this  version 

tr.tarB(l) . name=7Bltarget 7 ;  qR(l,l)=50;  qR(l,2)=20; 

tr .tarR(l) .loc=[qR(l , 1)  qR(l,2)];  X  x  and  y,  in  km 

tr . tarR(l) . size=0;  tr .tarR(l) .name=7Rltarget 7 ; 

7,  initial  and  final  times  for  mission,  unit  of  time  is  1  min 
t_initfin=[0  ;  20]; 

X  use  all  caps  for  global  variables  or  make  them  stand  out  in  some  way 

global  N0M_ INPUTTRA J  N0M_STATETRAJ  NQM_T 

load  simpnominalll;  7.  load  the  nominal  trajectory 


7,  weights  in  the  cost  function  of  the  nonlinear -quadratic  game 
7#  min  max  J(uB,uR) 

X  uB  uR 


7. 

7.  J(uB,uR)  =  (1/2)* 

Xintegral[ti,tf]{x7*Q*x  +  2*q7*x  +  2*rl7*uB  -  2*r27*uR  + 
XuB7  *Rl*uB  -  uR7  *R2*uR  > 

%  +  (l/2)*x(tf ) 7*Qf*x(tf )  +  qf 7  *x(tf ) 

x  mmmmmmmmmmmmmmmmxmm 


global  g_Q  g_q  g_rl  g_r2  g_Rl  g_R2  g„Qf  g„qf 
X  [xiB(l , 1) ;  xiB(l ,2) ;  etaB(l) ;  zetaB(l) ;  xiB(2,l); 

XxiB(2,2);  etaB(2) ;  zetaB(2)] 


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 


X  weights  in  the  cost  function  of  the  nonline ar-quadratic  game(MDCM2. 55) 
X  min  max  J(uB,uR) 

X  uB  uR 


X 


XJ(uB,uR)=integral [t0,tf] {sum_{i=l}~{NB__u}aB(i)*( I  I xiB(i) (t)- 
XqB(i) (t) | I ) ~2~bB(i)*(etaB(i) (t))~2 

X  -sum_{k=l}~NB_o  pB(i,k) I |xiB(i) (t)-eB(k) I | ~2+{uB(i) (t)}7RB(i)uB(i) (t)) 

X  -sum_{j=l}~{NR_u}aR(j)*  (  I  |xiR(j)  (t)-qR(j)  (t)  I  I )  ~2-bR( j ) * (etaR( j )  (t))~2 
X  -sum_{l=l}~NR_o  pR(j  ,1)  I  |xiR(j)  (t)-eR(l)  |  I  ~2+{uR(j)  (t)> 7RR(i)uR( j )  (t)) 

X  +sum_{i=l}~{NB_u}cB(i)*  ( I |xiB(i) (tf )-qB(i) (tf ) | | ) ~2~dB(i)* (etaB(i) (tf )) ~2) 
X  -sum_{i=l}~{NR_u}cR(i)*  ( I |xiR(i) (tf )-qR(i) (tf ) I  I ) ~2-dR(i) * (etaR(i) (tf ) ) "2) 


xxmxmxmxmmxmxxxxxx%xxxmxxxxxxxmxmxxxxxmxxxxmmxxx)i 


aB(l)=0.05;Xweight  on  the  distance  between  blue  1  and  its  target. 
bB(l)=0;  Xweight  on  the  number  of  blue  l7s  platf orm(running  cost). 
aR(l)=0.05;Xweight  on  the  distance  between  red  1  and  its  target. 
bR(l)=0;  Xweight  on  the  number  of  red  l7s  platf orm (running  cost). 

Xweights  on  control  command, velocity  and  firing  intensity, for  blue  1. 
RB_1=([800  0  0  ;  0  800  0  ;  0  0  200] ); Xweights  on  control  command  for  blue  1. 
RR_1=([800  0  0  ;  0  800  0  ;  0  0  200] ); Xweights  on  control  command  for  red  1. 
cB(l)=0; Xweight  on  the  distance  between  blue  1  and  its  target  at  final  time. 
dB(l)=0. 1 ; Xweight  on  the  terminal  number  of  blue  l7s  platforms. 
cR(l)=0; 

dR(l) =10 ;X  or  4o  or  60. 

Qvec=2*[aB(l) ;  aB(l);  -bB(l) ;  0;  -aR(l);  -aR(l) ;  bR(l);  0]; 

Qvecf =2*  [cB(l) ;  cB(l);  -dB(l) ;  0;  -cR(l) ;  -cR(l);  dR(l) ;  0  ];  g„Q 
=  diag(Qvec) ;  g_q  =2*  [-aB(l) *qB(l , 1) ;  -aB(l)*qB(l,2) ;  0  ;  0; 
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aR  ( 1 ) *qR (1,1) ;  aR(l)  *qR(l , 2) ;  0  ;  0  ]  ;  g_rl  =  zeros(3*NB_u,  1) ; 
g_Rl  =  diag( [diag( [RB_1] )] ) ;  g_Qf  =  diag(C)vecf ) ;  g_qf  = 
2*[-cB(l)*qB(l,l);  -cB(l)*qB(l,2) ;  0  ;  0;  cR(l) *qR(l , 1) ; 

cR(l)*qR(l ,2) ;  0  ;  0] ; 

2.6.2  Scenario  File  for  Joust 


mmnmm  mmmmmmmmnmmmmmmmm 


7,  UNIT  PROPERTIES 


8/0  numbers  of  categories 
un.NBc=5;  un.NRc=5; 

7,  each  category  is  represented  by  an  integer, 

#/0  which  will  used  for  indexing  into  parameter  matrices 

7o  1:  ground  troops  2:  air  defense  3:  bombers 

*/0  4:  interceptors  5:SEAD 

*/8  numbers  of  units 

un . NBu=l ;  un . NRu= 1 ; 

categories  of  units  for  each  force 
%  Blue  1  is  a  bomber  unit.  Blue  2  is  an  interceptor  unit 
7#  Red  1  is  a  ground  troop  unit.  Red  2  is  an  interceptor  unit 
un .  cB~  [4]  ;  un .  cR=  [4]  ; 

%  descriptive  names  of  units 

un.nameB  =  {’interceptor’};  un.nameR  =  {’bomber’}; 

7,  Assume  that  one  unit  can  attack  only  one  enemy  unit , 

7*  but  one  unit  can  be  attacked  by  multiple  units 
7»  In  this  example,  B1  shoots  at  R1 ,  B2  shoots  at  R2 
7,  R1  shoots  at  B2 ,  R2  shoots  at  B1 


7b  matrix  form 

un .  f  B=  [1]  ;  un .  f  R=  [1]  ; 

7.  from  the  shooter’s  perspective: 

7o  Blue  unit  i  shoots  at  all  the  Red  units  with  indices  un.fsB{i} 
un.fsB={l  };  un. f sR={l  }; 

7b  from  the  shootee’s  perspective: 

7»  Blue  unit  i  is  shooted  at  by  all  the  Red  units  with  indices  un.feB{i> 
un.feB={  1};  un. f eR={l  }; 

7.  state  vectors  for  each  force 

%  xB=  [xiB_lx;  xiB_ly;  etaB_l;  zetaB_l;  xiB_2x;  xiB_2y;  etaBJ2;  zetaB_2] 

7b  state  vector  for  the  whole  system 
7#  xsys=  [xB ;  xR] 

7.  initial  conditions  for  states 

7b  [xiB_lx;  xiB_ly;  etaB_l;  zetaB_l;  xiB_2x;  xiB_2y;  etaB_2;  zetaB_2] 

xB_init=  [20;  50;  10;  10  ];  xR_init=[80;  52; 

10;  10  ];  x_init=[xB_init ;  xR_init] ; 

7b  control  vector  for  a  Blue  unit  i 
7b  uB__i=[muB_ix;  muB_iy;  piB_i  ] 

7,  control  vector  for  the  Blue  force 
7b  uB=  [uB_l ;  uB_2;  ...;  uB_NBu  ]; 

7o  control  vector  for  the  whole  system 
70  usys=[uB  ;  uR] 

numinputs  =  3*(un.NBu+un.NRu) ;  numstates  -  4*(un.NBu+un.NRu) ; 
contr_uplim  =  ones(l , numinputs) ;  contr_lolim  =  [] ;  for 
i = 1 : un . NBu+un . NRu , 
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contr_lolim  =  [contr_lolim,  -1,  -1,  0]; 
end; 

mmmmmmmmrammmmmmmmmmnm 

X  CONSTANT  PARAMETERS  (WHICH  DEPEND  ON  THE  SCENARIO) 

7,  maximum  speeds,  km/min 

pm . alphaB=  [0 . 5 ;  0;  10;  10;  10];  pm . alphaR= [0 . 5 ;  0;  10;  10;  10]; 

X  parameter  in  probability  of  engagement  function 
pm.sigmaB=ones(un.NBc,un.NRc) ;  pm. sigmaR=ones(un.NRc,un.NBc) ; 

XEx ample : sigma  for  Blue  unit  i  against  Red  unit  j  is 
7,  pm.  sigmaBCun.  cB(i)  ,un.  cR(  j) ) 

7*  modification  factor  for  number  of  engagements 

pm.betaengB=ones(un.NBc,un.NRc) ;  pm.betaengR=ones(un.NRc,un.NBc) ; 

7b  modification  factor  for  prob  kill  of  a  weapon 

pm.betawepB=ones(un.NBc,un.NRc) ;  pm.betawepR=ones(un.NRc,un.NBc) ; 

7,  prob  kill  of  weapon  type  i  against  platform  type  j 
pm . pkillB=0 . 8*ones (un . NBc , un . NRc) ; 
pm . pkillR=0 . 8*ones (un . NRc , un . NBc) ; 

7b  salvo  size  of  platform  type  i  shooting  at  platform  type  j 
pm.salvoB=ones(un.NBc,un.NRc) ;  pm.salvoR=ones(un.NRc,un.NBc) ; 

7b  parameter  of  the  distance  factor  (varphi)  function 
pm.rzeroB=5*ones(un.NBc,un.NRc) ;  pm. rzeroR=5*ones(un.NRc,un.NBc) ; 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

7.  TERRAIN  INFORMATION 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

7b  the  rectangular  zone  for  the  mission 

7b  coordinates  of  lower  left  and  upper  right  corners,  in  km 
7b  zone_lim=  [  [xminjymin]  ,  [xmax ;  ymax]  ] 
tr . zone_lim= [ [0 ; 0] , [100 ; 100] ] ; 

7b  obstacle  locations,  radii  and  names 
tr.NBo=0;  tr.NRo-0; 

tr . obsB(l)  . loc=  [0  0];  7b  x  and  y,  in  km 

tr . obsB ( 1 ) . rad=0 ;  tr . obsB ( 1 ) . name=  * ’ ; 
tr .  obsR(l)  .  loc-[0  0];  7#  x  and  y,  in  km 

tr.obsR(l) .rad=0;  tr.obsR(l) .name=> } ; 

7#  fixed  target  locations,  sizes  and  names 

tr.NBt=l;  tr.NRt=l;  qlB=80;q2B=50;  tr .tarB(l) .loc=[qlB  q2B] ; 

7«tr  .tarB(l) .  loc=[80  50];  7.  x  and  y,  in  km 

tr .  tarB(l) .  size=0 ;  7b  not  used  in  this  version 

tr . taxB(l) .name=,Btarget J ;  qlR=20;  q2R=52; 
tr .tarR(l) . loc=[qlR  q2R] ;  %  x  and  y,  in  km 

7ctr.tarR(l)  .loc=[20  32]  ; 

tr . tarR(l)  . loc=[qlR  q2R]  ;  7#  x  and  y,  in  km 

tr .tarR(l) .size=0;  tr.tarR(l) .name=,Rtarget> ; 

7#  initial  and  final  times  for  mission,  unit  of  time  is  1  min 
t_initfin=[0  ;  20]; 

7b  use  all  caps  for  global  variables  or  make  them  stand  out  in  some  way 

global  N0M_INPUTTRAJ  NOM^STATETRAJ  N0M_T  LASTFB 

load  joust„nominal;  7b  load  the  nominal  trajectory 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

7b  weights  in  the  cost  function  of  the  nonlinear-quadratic  game 
7o  min  max  J(uB,uR) 

7b  uB  uR 


67 


7o 

7o  J  (uB  ,uR)  =  (1/2)  * 

%  integral [ti,tf]{x’*Q*x  +  2*qJ*x  +  2*rl’*uB  -  2*r2’*uR 
70+  uBJ*Rl*uB  -  uR,*R2*uR  } 

70  +  (l/2)*x(tf )  ,*Qf*x(tf )  +  qf  *  *x(tf ) 

%  mmmmmmmmmmmmmmmmmi! 

aB=l ;  aR=l ;  bB=10;  bR=10;  global  g_Q  g_q  g_rl  g_r2  g_Rl  g_R2  g_Qf 


g-qf 


7«  [xiB_lx;  xiB_ly ;  etaB_l;  zetaB.l;  xiB_2x;  xiB„2y;  etaB_2;  zetaB_2] 

Qvec  =  0.1*  [1  ;  1  ;  0  ;  0  ;  ~1  ;  -1  ;  0  ; 

0  ]; 

XQvec  -  [aB;aB; ~bB;0; -aR; -aR;bR;0] ; 
g_Q  =  diag(Qvec)  ; 

Xg_q  -  zeros (8,1); 

g_q  -  0.1*  [-80;  -50;  0;  0;  20;  52;  0;  0];  g„rl  =  zeros(3,l);  g_r2 
=  zeros (3,1);  g_Rl  -  150*diag([4  42]);  g_R2  -  150*diag([4  42]); 
g_Qf  =  diag(  [0  0  -0 . 20  0  0  0  5  0] ) ; 

%  15 

%  35 

g_qf  =  zeros (8,1);  clear  aB  aR  bB  bR; 
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Chapter  3 


Experiment  3:  Controller 
Performance  under  Noise  in  the 
State  Observation 

3.1  Executive  Summary 

We  performed  a  series  of  experiments  to  evaluate  the  effectiveness  of  the  current  differential  game  technol¬ 
ogy  as  a  means  of  countering  enemy  actions  under  idealized  situations  with  perfect  information  about  the 
enemy  initial  conditions  and  objectives,  but  with  noisy  measurements  of  the  enemy  state.  Our  main  find¬ 
ings  are  that  while  average  values  look  good,  individual  sample  paths  might  be  quite  surprising.  One  can 
conclude  that  the  game  theoretic  controller  CPC  (Controller-Plant-Controller)  is  sensitive  to  observation 
noise.  The  first  step  to  remedy  the  noise  problem  is  to  implement  proper  filters  in  the  controllers. 


3.2  Purpose  of  the  Experiment 

The  purpose  of  the  experiments  is  to  test  the  behavior  of  the  game  theoretic  controller  CPC  (Controller- 
Plant- Controller)  when  there  is  noise  added  to  the  measurements  of  the  enemy  state. 


3.3  Hypothesis  to  Prove  or  Disprove 

The  current  differential  game  technology  provides  an  effective  means  of  countering  enemy  actions  under 
idealized  situations  with  perfect  information  about  the  enemy  initial  conditions  and  objectives,  but  with 
noisy  measurements  of  the  enemy  state. 


3.4  Experimental  Setup 

We  performed  a  series  of  experiments  to  evaluate  the  effectiveness  of  the  current  differential  game  tech¬ 
nology  as  a  means  of  countering  enemy  actions  under  idealized  situations  with  perfect  information  about 
the  enemy  initial  conditions  and  objectives,  but  with  noisy  measurements  of  the  enemy  state.  Here,  both 
the  plant  and  internal  models  are  the  same,  i.e.,  the  MDCM.  Increasing  levels  of  noise  will  be  added  to 
the  state  variables  when  constructing  the  observed  state  variables  (the  output  variables).  The  control 
actions  of  the  Blue  and  Red  teams  are  generated  by  the  proposed  game  theoretic  algorithm. 

The  Sequential  Linear  Quadratic  Method  (SLQM),  as  described  in  [3]  and  [4]  solves  only  a  math¬ 
ematical  problem.  Once  the  weights  and  the  parameters  are  set,  the  method  computes  a  solution  to 
the  differential  game.  This  means  that  all  the  future  actions  of  both  parties  are  determined.  They  are 
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described  in  terms  of  functions  of  time  with  the  domain  being  the  duration  of  the  game.  In  this  setup 
neither  of  the  parties  has  any  initiative,  aside  that  a  third  party  which  has  the  complete  knowledge  of  all 
the  resources  for  both  sides  and  the  battlefield  is  doing  the  computations. 

The  fact  that  there  is  only  one  intelligent  entity  who  does  all  the  computations  is  quite  unnatural  since 
in  a  war  there  are  (generally)  three  separate  entities:  the  friendly  side,  the  enemy  and  the  battlefield.  The 
battlefield  determines  the  rules,  the  dynamics  of  the  war.  The  information  coming  from  the  battlefield  is 
observed  by  each  side  with  possible  addition  of  noise  and  corruption  and  it  is  processed  by  the  friendly 
and  enemy  sides  using  the  intelligence  they  have. 

The  natural  step  in  making  the  experiment  setup  more  realistic  is  therefore  to  separate  the  three 
entities  mentioned  above.  This  results  in  the  Controller- Plant- Controller  (CPC)  Setup  of  Figure  3.1. 


a 


uBIue  Noisy  x 

<- 

Blue  Controller 

Controls  States 

Battlefield 

uRed  Noisy  x 

b 

Red  Controller 

NoiseB 


NoiseR 


Figure  3.1:  Controller-Plant-Controller  Setup 

The  CPC  setup  realizes  the  separation  of  the  battlefield  from  the  enemy  and  friendly  sides.  In  this 
way  noise  can  be  injected  separately  to  the  observation  channels  of  each  side.  More  important  than  that 
each  side  will  have  its  own  intelligence,  the  intelligence  to  compute  the  control  inputs  to  the  battlefield. 
In  our  case,  based  on  the  development  of  the  game  theoretical  method,  the  intelligence  for  both  sides  is 
chosen  to  be  the  differential  game  solver  based  on  SLQM.  This  means  that  each  of  the  Red  and  Blue  sides 
will  have  their  model  of  the  battlefield  and  the  cost  function,  possibly  with  different  weights,  parameter 
mismatches,  parametric  uncertainties.  Each  side  will  compute  the  solution  of  the  game  they  posed 
(internal  to  the  controller)  and  this  way  will  prepare  their  future  inputs  to  the  battlefield.  It  is  important 
to  note  here  that  the  sides  can  model  the  battle  as  they  want,  the  implementation  leaves  this  option  free. 
The  Blue  side  might  be  calculating  its  inputs  to  the  battlefield  based  on  the  game  controller  schemes 
whereas  the  red  side  might  be  using  another  scheme  like  heuristic  methods  or  artificial  intelligence.  The 
advantage  of  the  CPC  setup  comes  from  its  flexibility  and  modularity.  The  model  for  the  battlefield  is 
kept  separate  from  the  rest,  all  it  needs  are  the  inputs.  It  does  not  matter  how  the  inputs  are  calculated. 
This  also  opens  the  possibility  of  changing  the  battlefield  (plant)  model  and  keep  the  same  setup. 

Another  subtle  point  is  about  the  modeling  of  the  enemy  side.  For  example  the  Blue  side  might  be 
using  the  game  technology  to  compute  its  inputs  to  the  battlefield.  The  Blue  side  has  in  its  internal 
model  the  Red  side  as  the  opponent  in  its  internal  game.  The  Red  side  on  the  other  hand  does  not 
necessarily  have  to  use  the  game  technology,  but  instead  might  be  using  a  different  approach,  say  a 
heuristic  controller.  Regardless  of  this,  since  the  Blue  side  is  sticking  to  the  game  technology  it  would  be 
modeling  Red’s  actions  incorporated  in  its  game. 

Pushing  the  content  of  a  controller  one  step  further,  since  the  sides  are  separated  they  can  inde¬ 
pendently  implement  their  detection  technology  or  any  future  new  technology  in  their  controllers.  For 
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instance  the  weight  estimator  for  the  enemy  actions  which  looks  like  a  highly  challenging  task  at  this 
point  could  be  implemented  in  the  controllers  of  both  sides  in  the  future.  Basically  as  the  enemy  actions 
are  not  known  in  the  beginning  of  the  battle,  the  enemy  weights  in  the  internal  game  model  are  unknown. 
Starting  from  a  best  guess  for  the  enemy  weights  in  the  beginning  of  the  battle  the  weight  estimator  can 
update  them  based  on  the  past  enemy  actions.  This  “adaptive”  approach  is  not  implemented  yet,  but  is 
certainly  worth  investigating  in  future.  It  is  clear  that  an  adaptation  scheme  must  be  carefully  invented 
since  the  subject  is  new  and  not  much  is  known. 

Aside  all  the  internals  of  the  controllers  it  is  clear  that  each  side  will  have  better  information  about 
itself  from  the  battlefield  compared  to  the  information  observed  about  the  enemy  side.  This  structure 
can  easily  be  modeled  since  the  setup  allows  injection  of  the  noise  to  the  state  observation  in  any  desired 
manner,  i.e.  any  kind  of  noise  can  be  injected  to  any  desired  state. 

Implementation  of  the  CPC  set-up 

Although  some  of  the  conceptual  entities  are  not  separately  programmed  it  is  very  important  to  under¬ 
stand  the  ideas  behind  the  implementation.  The  general  idea  to  realize  the  setup  is  that  each  of  the 
Blue  and  Red  commanders  has  a  vision  about  the  inputs  they  will  supply  to  the  battlefield  even  before 
the  simulation  starts.  This  is  done  by  setting  up  their  own  games  and  computing  the  solution  by  using 
SLQM.  The  inputs  they  supply  to  the  battlefield  are  just  the  solutions  of  these  separate  games.  This 
is  in  a  sense  modeling  the  knowledge  prior  to  the  battle.  The  calculations  are  carried  out  in  the  Game 
Calculator  block  in  Figure  3.2.  The  input-state  pair  coming  from  the  game  calculator  is  then  fed  to  the 
Storage  block  of  Figure  3.2.  It  is  assumed  here  that  both  sides  have  their  intelligence  chosen  as  the  game 
technology,  but  as  described  above  this  is  not  necessary.  All  one  needs  to  incorporate  any  other  kind  of 
intelligence  in  the  simulation  is  just  to  supply  the  inputs  it  computed  to  the  battlefield. 

Once  the  prebattle  computations  are  finished,  the  solution  of  the  game,  the  inputs  and  the  corre¬ 
sponding  state  trajectories  are  kept  in  the  storage  block  for  each  side.  In  the  beginning  of  the  simulation, 
which  can  be  visualized  in  Figure  3.1  both  sides  supply  their  stored  input  to  the  battlefield.  In  return 
the  states  corresponding  to  the  inputs  are  observed  from  the  battlefield  with  the  addition  of  noise.  This 
information  is  collected  at  the  controller  and  it  is  filtered.  It  is  compared  to  the  stored  value  of  the  states. 
The  storage  block  keeps  mainly  the  precomputed  input-state  pair  for  each  side  inside  the  controller.  It 
serves  the  purpose  of  sending  the  precomputed  inputs  to  the  battlefield  as  long  as  the  value  of  the  actual 
states  is  close  to  the  predicted  ones.  This  exactly  means  that  as  long  as  the  battle  takes  its  course  as 
predicted,  the  controller  just  feeds  the  precomputed  inputs.  If,  however,  at  any  given  moment  the  actual 
values  of  the  states  deviate  from  the  precomputed  values,  the  game  calculator  is  recalled  and  it  computes 
a  new  input-state  pair  using  the  (possibly)  updated  weights  and  sends  it  to  the  storage  block.  The  com¬ 
parison  task  is  achieved  by  the  Decision  Block.  The  new  computation  starts  from  the  time  the  deviation 
occurred  and  has  as  its  final  time  the  final  time  of  the  overall  battle  simulation.  This  is  by  no  means  a 
restriction  since  the  time  horizon  does  not  have  to  be  constant  and  can  be  extended  further  if  necessary. 
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Figure  3.2:  Conceptual  Representation  of  the  Controller 


Game  Scenario 

The  scenario  used  in  all  the  experiments  is  the  crossll  scenario.  In  which  a  Red  Bomber  is  trying  to 
reach  its  target  on  the  south  starting  from  its  base  on  the  north.  The  Blue  unit  of  interceptors  on  the 
other  hand  starts  from  its  base  on  west  and  flies  to  east  to  intercept  the  Red  bomber. 

The  payoff  function  is  given  by 


J\x\uB,uR)  =  \Jt 


x(t)'Q(t)x(t)  +  2  x(t)'d(t) 

+uB{t)'RB(t)uB{t)  +  2uB(t)'rB(t) 
-  uR{t)' RR(t)uR{t)  -  2 uR{t)'rR{t) 

+  \x(tfyQfx{tf)  +  x{tfyrf. 


dt 


(3.1) 


All  the  parties,  Blue  side,  Red  side  and  the  plant  have  the  same  model  with  the  same  parameters. 
The  parties  are  using  the  same  intelligence  based  on  the  differential  game  theory.  In  this  experiment  the 
information  coming  from  the  battlefield  to  the  parties  is  noisy.  The  level  of  the  noise  is  increased  to  see 
how  the  controllers  will  react. 

As  both  sides  compute  their  own  internal  game  it  does  not  make  sense  to  assign  the  same  weights  to 
both  sides.  If  all  the  weights  were  the  same  both  sides  would  calculate  the  same  thing  twice.  Therefore 
in  all  the  experiments  whether  noise  or  parametric  mismatch  (Experiment  4)  is  introduced  there  will 
always  be  a  weight  mismatch  between  the  parties.  All  other  factors  will  be  included  in  the  experiments 
gradually.  The  most  complicated  experiments  are  the  ones  with  noisy  observations  where  there  are 
weight  differences  and  parameter  mismatches  (including  plant  and  both  controllers  internal  parameters, 
see  Experiment  4).  It  is  a  good  way  to  think  of  the  weight  assignments  as  the  decision  on  the  strategy 
whereas  of  the  parameter  mismatches  (between  plant  and  the  other  controller)  and  the  noise  as  the 
observation  problems. 
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All  the  weights  and  the  parameters  are  the  same  for  both  parties  except  the  final  cost  weights.  The 
running  costs  for  both  sides  are  given  as: 

'  1/10  U  0  0  0  U  0  0 

0  1/10  0  0  0  0  0  0 

0  0000  000 

0  0000  000 

Q  = 

0  0  0  0  -1/10  0  0  0 

0  0  0  0  0  -1/10  0  0 

0  0000  000 

0  000  0  0  00 

d=[  —8  -5  005200] 

'  800  0  0  1  800  0  0 

Rb  =  0  800  0  Rr=  0  800  0 

0  0  200  J  0  0  200 

rB  =  [  0  0  0  ] rB  =  [  0  0  0] 

The  weights  which  are  different  are  the  final  cost  matrices: 

'00  0  00000 

00  0  00000 

0  0  -1/5  0  0  0  0  0 

00  0  00000 

QBf  = 

00  0  00000 

00  0  00000 

00  0  000  80  0 

00  0  00000 

'00  0  00000 

00  0  00000 

0  0  -1/10  0  0  0  0  0 

00  0  00000 

QRf  = 

00  0  00000 

00  0  00000 

00  0  000  60  0 

00  0  00000 

As  a  reminder,  the  plant  model  describes  the  evolution  of  the  states  &,  £2,  the  horizontal  and  vertical 
positions,  tj,  the  number  of  platforms,  and  C,  the  number  of  weapons  per  platform  with  the  inputs  m 
and  fi‘2  describing  the  velocity  vector  and  it  the  firing  intensity  respectively. 

x  =  [  &>1  6.2  Vb  C 6  Zrl  Cr2  T)r  ( r  } 
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Tabic  3.1:  The  Noise  Levels  for  the  Experiments 


Experiment 

Noise  Level 

1 

i% 

2 

2% 

3 

3% 

4 

4% 

5 

5% 

6 

6% 

7 

7% 

8 

8% 

9 

9% 

10 

10% 

11 

11% 

12 

15% 

13 

20% 

14 

25% 

15 

30% 

16 

35% 

17 

40% 

18 

50% 

19 

60% 

20 

90% 

21 

130% 

U  =  [  fib  1  M&2  Kh  fir  1  Mr2  ]  • 

The  final  cost  is  only  on  the  number  of  platforms  with  different  weights  in  both  parties  which  have 
different  values  for  both  the  blue  and  the  red  side’s  internal  games.  Looking  at  the  cost  matrix  for  the 
Blue  side  it  is  clear  that  Blue  puts  more  weight  on  its  final  number  of  platforms  as  well  as  on  Red’s  final 
number  of  platforms.  This  can  be  interpreted  as  Blue  will  try  to  eliminate  Red’s  platforms  as  much  as 
it  can  but  at  the  same  time  will  try  to  preserve  its  platforms.  The  Red  side  on  the  other  hand  does  not 
put  too  much  cost  on  its  and  Blue’s  final  number  of  platforms  compared  to  the  Blue’s  weights.  This  can 
be  interpreted  as  Red  wants  to  reach  its  target  at  the  cost  of  loosing  its  platforms  and  without  too  much 
engaging  with  blue. 


3.5  Experimental  Results 

At  this  first  step  of  the  experiments  the  noise  is  added  to  only  the  observation  of  the  enemy  states.  It  is 
assumed  that  each  side  has  perfect  information  about  its  own  states.  The  noise  is  white  noise  with  zero 
mean  and  is  generated  by  mat  lab  using  the  random  number  generator.  As  each  component  of  the  states 
has  different  numerical  value  ranges  the  noise  injected  to  the  same  state  cannot  have  uniform  amplitude. 
The  amplitude  of  the  injected  noise  is  specified  relatively  to  the  initial  condition  of  the  state.  For  instance 
if  the  initial  horizontal  position  of  the  units  is  80  and  the  number  of  the  platforms  is  10  with  10%  noise 
injected  the  maximum  value  that  the  noise  amplitude  can  take  for  the  position  is  8  and  for  the  number 
of  platforms  is  1.  Experiments  with  increasing  percentage  of  noise  level  are  carried  out  on  the  scenario. 
The  level  of  the  noise  is  summarized  in  Table  3.1. 

Figure  3.3  represents  the  development  of  the  battle  with  perfect  observation.  Figures  3.4  -  3.7  show 
the  average  value  of  100  sample  paths  computed.  The  states  and  the  controls  are  plotted,  in  addition 
to  that  the  times  when  the  deviation  becomes  too  much  and  the  game  is  recomputed  is  plotted  in  the 
Controller  Times  graph.  The  value  of  the  controller  times  function  is  either  zero  or  one  and  the  average 
value  over  the  sample  paths  is  presented  in  the  graph.  If  one  reads  0.4  at  6th  minute  this  means  that 
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out  of  100  sample  paths  the  deviation  has  exceeded  the  threshold  and  the  game  calculator  was  activated 
40  times.  Comparing  the  average  values  for  different  levels  of  noise,  it  is  clear  that  they  look  similar  in 
general.  The  interesting  part  is  the  controller  times  function.  It  is  observed  from  its  average  value  that 
the  controller  was  called  for  high  level  of  noise  for  a  small  number  of  sample  paths  during  almost  all  the 
battle  time. 

A  better  understanding  will  be  reached  when  certain  sample  paths  are  observed  for  different  noise 
levels. 

Positions  of  the  units  Number  of  platforms 


Figure  3.3:  The  Scenario  without  any  Noise 

Figures  3.8-3.14  show  a  couple  of  sample  paths  for  different  noise  levels.  When  the  level  of  the  noise  is 
low  there  is  not  a  big  problem.  However  as  the  amplitude  of  the  noise  is  increased  there  are  problematic 
sample  paths  as  shown  in  Figures  3.10,  3.12,  3.13.  It  can  be  asserted  in  a  figurative  manner  that  although 
on  the  average  the  results  are  looking  good  there  is  a  strong  standard  deviation. 
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Positions  of  the  units  Number  of  platforms 


Figure  3.4:  Average  Values  of  the  States  and  Controls  over  100  Sample  Paths  for  the  Noise  Amplitude 

1% 


Positions  of  the  units  Number  of  platforms 


Figure  3.5:  Average  Values  of  the  States  and  Controls  over  100  Sample  Paths  for  the  Noise  Amplitude 
10% 
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Positions  of  the  units 


Number  of  platforms 


Fire  intensities  Speed  Controls 

1  r  1  r 
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time 


Figure  3.6:  Average  Values  of  the  States  and  Controls  over  100  Sample  Paths  for  the  Noise  Amplitude 
90% 


Positions  of  the  units  Number  of  platforms 


Figure  3.7:  Average  Values  of  the  States  and  Controls  over  100  Sample  Paths  for  the  Noise  Amplitude 
130% 
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Positions  of  the  units  Number  of  platforms 


Figure  3.8:  A  Sample  Path  (Noise  Amplitude  1%) 


Positions  of  the  units 


Number  of  platforms 


Figure  3.9:  A  Sample  Path  (Noise  Amplitude  10%) 
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Positions  of  the  units  Number  of  platforms 


Figure  3.10:  A  Sample  Path  (Noise  Amplitude  90%) 


Positions  of  the  units  Number  of  platforms 


10  15 

time 


10 
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Figure  3.11:  A  Sample  Path  (Noise  Amplitude  90%) 


Positions  of  the  units 
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Figure  3.12:  A  Sample  Path  (Noise  Amplitude  130%) 


Positions  of  the  units  Number  of  platforms 


Figure  3.13:  A  Sample  Path  (Noise  Amplitude  130%) 
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Positions  of  the  units  Number  of  platforms 
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Figure  3.14:  A  Sample  Path  (Noise  Amplitude  130%) 


3.6  Conclusions  and  Recommendations 


Our  main  findings  are  that  while  average  values  look  good,  individual  sample  paths  might  be  quite 
surprising.  One  can  conclude  that  the  CPC  is  sensitive  to  observation  noise.  The  first  step  to  remedy 
the  noise  problem  is  to  implement  proper  filters  in  the  controllers.  After  this  new  experiments  must  be 
run  to  test  the  robustness  and  the  effects  of  the  filters  on  the  observations. 
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Chapter  4 


Experiment  4:  Controller 
Performance  under  Parameter 
Variations 

4.1  Executive  Summary 

The  purpose  is  to  test  how  the  Controller-Plant-Controller  setup  (CPC)  will  react  to  parameter  mis¬ 
matches  between  the  battlefield  and  the  sides  as  well  as  to  parameter  mismatches  between  the  sides. 
Assuming  that  both  sides  have  chosen  the  game  theory  as  the  intelligence  behind  their  controllers,  sys¬ 
tematic  tests  have  been  performed  to  investigate  its  sensitivity,  i.e.,  how  strongly  the  proposed  game- 
theoretic  controller  reacts  to  changes  in  the  parameters.  The  important  conclusion  to  draw  from  these 
experiments  is  that  even  a  single  parameter  can  have  important  effects  on  the  outcome  of  the  battle.  It 
is  therefore  very  important  to  be  able  to  estimate  the  enemy  parameters  in  order  to  succeed  in  the  battle 
simulation. 


4.2  Purpose  of  the  Experiment 

The  purpose  is  to  test  how  the  Controller-Plant-Controller  setup  (CPC)  will  react  to  parameter  mis¬ 
matches  between  the  battlefield  and  the  sides  as  well  as  to  parameter  mismatches  between  the  sides. 


4.3  Hypothesis  to  Prove  or  Disprove 

The  current  differential  game  technology  is  affected  by  parameter  mismatches.  The  purpose  of  this 
experiment  is  to  investigate  its  sensitivity,  i.e.,  how  strongly  the  proposed  game-theoretic  controller 
reacts  to  changes  in  the  parameters. 


4.4  Experimental  Setup 

The  experimental  setup  is  the  same  as  for  Experiment  3  and  is  described  in  Chapter  3. 


4.5  Experimental  Results 

The  purpose  of  this  set  of  experiments  is  to  test  how  the  Controller-Plant-Controller  setup  will  react 
to  the  parameter  mismatches  between  the  battlefield  and  the  sides  as  well  the  parameter  mismatches 


83 


Table  4.1: 
Platforms 


Different  Experimental  Setup  for  Weight  Mismatches,  Weights  on  Final  Number  of  Red 


Red’s  weight 

Blue  has 

Blue  has 

Blue  has 

20 

10 

20 

40 

40 

20 

40 

60 

60 

40 

60 

80 

between  the  sides.  Assuming  that  both  sides  have  chosen  the  game  theory  as  the  intelligence  behind  their 
controllers,  the  investigation  starts  by  looking  at  weight  mismatches  between  the  sides. 

4.5.1  Weight  Mismatches 

If  both  the  Red  and  Blue  sides  have  chosen  to  use  the  game  technology  and  their  weights  in  their  cost 
functions  would  be  the  same,  it  would  not  make  any  sense  to  run  the  simulations  using  the  CPC.  In  that 
case  there  would  be  a  single  game  and  no  need  for  the  CPC  setup.  The  difference  between  the  strategies 
of  the  sides  is  implemented  by  the  difference  in  the  internal  weights  of  the  controllers.  For  the  following 
experiments  the  weight  differences  are  implemented  as  the  final  number  of  platforms,  exactly  in  the  same 
manner  as  in  the  case  of  noisy  experiments.  There  is  a  constant  mismatch  between  both  parties  in  ail  the 
experiments,  Blue  has  0.2  assigned  to  its  final  numbers  of  platforms  while  Red  assign  0.1  to  the  Blue’s 
final  number  of  platforms.  The  weights  that  are  changed  on  the  Blue  side,  however,  are  the  ones  on  the 
final  number  of  platforms  of  Red  as  summarized  in  Table  4.1. 

4.5.2  Experiments  with  Weight  Mismatches 

In  each  experiment  summarized  in  Table  4.1  Blue  puts  its  weights  below,  at  the  same  level,  and  above 
Red’s  weights.  This  resulted  in  nine  experiments.  In  Figures  4.1-  4.3  it  is  clear  that  Blue  chases  for  a 
longer  period  of  time  the  Red  side  when  it  has  higher  weights  assigned.  It  is  clear  from  these  experiments 
that  strategies  can  be  assigned  by  using  the  weights  in  the  internal  games  of  the  controllers. 
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Positions  of  the  units  Number  of  platforms 


Figure  4.1:  Case  1:  Red’s  Weight  is  20 


Positions  of  the  units  Number  of  platforms 


Fire  intensities  time 


Figure  4.2:  Case  2:  Red’s  Weight  is  40 


Positions  of  the  units  Number  of  platforms 


Figure  4.3:  Case  3:  Red’s  Weight  is  60 


Table  4.2.  Parameter  Mismatches  for  the  Experiments 


Case  1 

Plant 

Blue  Side 

Red  Side 

pkill  Blue 

0.3 

0.3 

0.1  0.8 

pkill  Red 

0.3 

0.1-  0.8 

0.3 

Case  2 

pkill  Blue 

0.8 

0.8 

0.1- 0.8 

pkill  Red 

0.8 

0.1-  0.8 

0.8 

Differential  Game  Weights  for  Parametric  Mismatch 


The  running  weights  for  these  experiments  are  the  same  as  in 
matrices  are  given  as: 

"  0  0  0  0  0  0 

0  0  0  0  0  0 

0  0  -1/5  0  0  0 

0  0  0  0  0  0 


QBf 


0  0  0  0  0  0 
0  0  0  0  0  0 
0  0  0  0  0  0 
0  0  0  0  0  0 


the  noisy  experiments. 

0  0  " 

0  0 
0  0 
0  0 
0  0 
0  0 
40  0 
0  0 


The  final  cost 


QRf 


00  0  00000 

00  0  00000 

0  0  -1/10  0  0  0  0  0 

00  0  00000 

00  0  00000 

00  0  00000 

00  0  000  20  0 

00  0  00000 


Exactly  as  it  is  done  to  test  the  Experiment  3  the  weights  are  only  on  the  final  number  of  the  platforms 
for  both  sides. 


Experimental  Results  for  Parametric  Mismatch 

The  experiments  can  be  categorized  under  two  major  cases: 

•  The  battlefield  dictates  low  probability  of  kill 
®  The  battlefield  has  a  high  probability  of  kill 

In  both  cases  it  is  assumed  that  each  side  has  perfect  information  about  their  own  probability  of  kill, 
but  does  not  exactly  know  in  their  internal  model  the  probability  kill  of  the  enemy.  As  summarized 
in  Table  4.2  experimental  results  are  obtained  by  varying  the  probability  of  kill  of  the  enemy  side  in 
the  internal  model  of  both  controllers.  The  results  are  shown  on  the  following  figures  under  the  low 
probability  of  kill  and  high  probability  of  kill  sections. 
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Low  Probability  of  Kill 


Positions  of  the  units 


Number  of  platforms 
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time 


Figure  4.4:  Both  sides  exactly  know  the  probability  of  kill  0.3 


Positions  of  the  units  Number  of  platforms 


Figure  4.5:  Red  Underestimates  Blue  (Red  thinks  pkillBlue=0.1)  pkill:0.3 
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Positions  of  the  units  Number  of  platforms 


Figure  4.8:  Blue  Overestimates  Red  (Blue  thinks  pkillRed— 0.8)  pkill:0.3 


Positions  of  the  units  Number  of  platforms 


Figure  4.9:  Both  Sides  Underestimate  (Red  thinks  pkillBlue=0.1  similarly  for  Blue)  pkill:0.3 
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Positions  of  the  units 


Number  of  platforms 


Figure  4.10:  Red  Underestimates  Blue  (Red  thinks  pkillBlue=0.1)  and  Blue  overestimates  Red  (Blue 
thinks  pkillR,ed=0.8)  pkill:0.3 


Positions  of  the  units  Number  of  platforms 


Figure  4.11:  Blue  Underestimates  Red  (Blue  thinks  pkillRed— 0.1)  and  Red  overestimates  Blue  (Red 
thinks  pkillBlue=0.8)  pkill:0.3 
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High  Probability  of  Kill 


Positions  of  the  units  Number  of  platforms 


Figure  4.13:  Both  sides  exactly  know  the  probability  of  kill  0.8 
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Figure  4.14:  Red  Underestimates  Blue  (Red  thinks  pkillBlue=0.1)  pkill:0.8 
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Positions  of  the  units  Number  of  platforms 


Figure  4.15:  Red  Underestimates  Blue  (Red  thinks  pkillBlue=0.4)  pkill:0.8 


Positions  of  the  units  Number  of  platforms 


Figure  4.16:  Blue  Underestimates  Red  (Blue  thinks  pkillRed=0.1)  pkill:0.8 
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Positions  of  the  units 
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Figure  4.17:  Blue  Underestimates  Red  (Blue  thinks  pkillRed=0.4)  pkill:0.8 


Positions  of  the  units  Number  of  platforms 


Figure  4.18:  Both  Sides  Underestimate  (Red  thinks  pkillBlue=0.1  similarly  for  Blue)  pkill:0.8 
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Positions  of  the  units  Number  of  platforms 


Figure  4.19:  Both  Sides  Underestimate  (Red  thinks  pkillBlue=0.4  similarly  for  Blue)  pkill:0.8 


Positions  of  the  units  Number  of  platforms 


Figure  4.20:  Both  Sides  Underestimate  (Red  thinks  pkillBlue— 0.4  Blue  thinks  pkillRed=0.1)  pkill:0.8 
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Positions  of  the  units  Number  of  platforms 


Figure  4.21:  Both  Sides  Underestimate  (Red  thinks  pkillBlue=0.1  Blue  thinks  pkillRed— 0.4)  pkill:0.8 

4.6  Conclusions  and  Recommendations 

It  is  clear  from  Figures  4.4-4.21  that  the  parameters  have  a  significant  impact  on  the  differential  game. 
Whenever  the  Red  side  underestimates  as  in  Figure  4.5  and  4.14,  it  takes  less  evasive  action  and  directs 
itself  towards  the  target.  On  the  other  hand,  whenever  there  is  overestimation  for  the  Red  side  as  in 
Figure  4.6,  more  evasive  action  is  taken.  Similar  remarks  follow  for  the  Blue  side  if  the  parameters  are 
underestimated  by  the  Blue  side  as  in  Figures  4.7,  4.16  and  4.17.  Blue  starts  chasing  red  earlier  than  it 
does  in  the  case  of  exact  estimation  or  overestimation  of  Figures  4.4,  4.8  and  4.13.  The  remaining  figures 
give  an  idea  about  the  combination  of  different  types  of  parameter  mismatches. 

The  important  conclusion  to  draw  from  these  experiments  is  that  even  a  single  parameter  can  have 
important  effects  on  the  outcome  of  the  battle.  It  is  therefore  very  important  to  be  able  to  estimate  the 
enemy  parameters  in  order  to  succeed  in  the  battle  simulation.  As  a  future  research  path  the  parameter 
estimation  from  the  enemy  actions  could  be  added  to  the  difficult  task  of  weight  estimation  form  the 
enemy  actions. 
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Chapter  5 


Experiment  5:  Controller 
Computational  Complexity 

5.1  Executive  Summary 

The  purpose  of  experiment  5  is  to  test  Hypothesis  5:  The  computational  complexity  of  the  differential 
game  technology  based  controller,  combined  with  an  extended  Kalman  filter  or  a  nonlinear  observer, 
increases  quadratically  as  a  function  of  the  number  of  units  and  linearly  as  a  function  of  the  mission 
duration. 

A  number  of  experiments  have  been  performed  to  test  that  Hypothesis. 

In  the  set  of  experiments,  both  the  plant  and  internal  models  are  the  same,  given  by  MDCM.  In  a 
first  set  of  experiments  we  increase  the  number  of  units  in  the  scenario  while  the  mission  objectives  and 
duration  are  kept  constant.  In  a  second  set,  the  mission  duration  is  increased,  while  the  mission  objectives 
and  the  number  of  units  are  kept  constant.  The  computation  time  and  the  number  of  iterations  required 
for  the  computation  of  the  control  law  to  converge  were  recorded  in  both  cases. 

Our  main  conclusions  are  that  the  computational  time  required  to  reach  the  convergence  criterion 
depends  on  many  factors,  such  as  the  units  categories,  the  number  of  units,  initial  trajectories,  weights  in 
the  cost  function,  step  size  in  our  numerical  procedure  and  the  manner  of  engagements  as  well  as  initial 
positions  and  target  locations.  Similarly  the  number  of  iterations  required  to  reach  a  convergence  criterion 
depends  on  the  same  factors.  Prom  our  experimental  results,  major  factors  which  affect  the  computational 
time  are  the  number  of  units  and  mission  duration.  As  expected  from  theoretical  considerations  the 
computational  time  of  the  controller  increased  quadratically  as  a  function  of  the  number  of  units.  We 
also  saw  that  it  increased  linearly  as  a  function  of  the  mission  duration,  while  the  number  of  iterations 
remained  relatively  constant  as  a  function  of  the  number  of  units. 


5.2  Introduction 

A  number  of  experiments  have  been  performed  to  test  the  following  Hypothesis  :  The  computational 
complexity  of  the  differential  game  technology  based  controller,  combined  with  an  extended  Kalman  filter 
or  a  nonlinear  observer,  increases  quadratically  as  a  function  of  the  number  of  units  and  linearly  as  a 
function  of  the  mission  duration. 

In  these  experiments,  both  the  plant  and  internal  models  are  the  same,  given  by  MDCM.  In  a  first  set 
of  experiments  we  increase  the  number  of  units  in  the  scenario  while  the  mission  objectives  and  duration 
are  kept  constant.  In  a  second  set,  the  mission  duration  is  increased,  while  the  mission  objectives  and 
the  number  of  units  are  kept  constant.  The  computation  time  and  the  number  of  iterations  required  for 
the  computation  of  the  control  law  to  converge  were  recorded  in  both  cases. 
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Table  5.1:  Data  For  One  vs.  One 


Blue 

Red 

Unit  categories 

fighter 

interceptor 

Initial  no.  of  platforms 

10 

10 

Initial  no.  of  weapons 

10 

10 

Initial  position 

(20,50) 

(50,82) 

Target  location 

(80,50) 

(50,20) 

Weights 

in 

cost 

function 

Run.  cost 

Ter.  cost 

Run.  cost 

Ter.  cost 

Dist.  to  target 

0.1 

0 

0.1 

0 

Velocity  command 

800 

0 

800 

0 

Firing  intensities  command 

200 

0 

200 

0 

No.  of  platforms 

0 

0.2 

0 

20 

5.3  Experiment  5.1:  The  Number  Of  Units  Is  Increased  With 
Fixed  Mission  Duration 

In  this  set  of  experiments,  the  mission  duration  is  kept  constant  at  20  minutes.  In  the  convergence  test 
the  control  change  is  set  to  0.01  and  the  step  size  0.5  is  used. 

5  experiments  have  been  done  for  each  of  the  following  cases:  1  vs.  1,  2  vs.  2,  3  vs.  3,  4  vs.  4 
and  5  vs.  5.  In  these  5  experiments  for  each  n  vs.  n  case  (  1  <  n  <  5  ),  the  units  categories,  initial 
conditions,  target  locations  and  nominal  trajectories  as  well  as  the  weights  in  the  cost  function  may  vary. 
The  computational  time  and  the  number  of  iterations  are  recorded  for  each  experiment. 

5.3.1  One  vs.  One 

Five  different  experiments  were  performed  in  this  case.  Here  we  report  on  one  example  of  those  five 
in  detail.  Table  5.1  summarizes  the  pertinent  information  for  the  two  opposite  forces  in  that  specific 
example. 

Figures  5.1  -  5.5  respectively  show  the  initial  trajectories,  the  control  update  at  each  iteration,  the 
Nash  solution  of  trajectories  and  the  corresponding  firing  intensities  as  well  as  number  of  platforms. 

The  computational  time  required  to  reach  the  convergence  criterion  in  this  experiment  is  141.935 
seconds,  the  number  of  iterations  is  13. 

In  the  other  four  experiments,  the  computational  time  required  to  reach  the  convergence  criterion  was 
around  140  seconds  and  the  number  of  iterations  ranged  from  13  to  16. 
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Figure  5.1: 
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Figure  5.2:  Control  Update  In  One  vs.  One 
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Figure  5.3:  Nash  Trajectories  In  One  vs.  One 
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Figure  5.4:  Nash  Firing  Intensities  In  One  vs.  One 
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Figure  5.5:  Nash  Number  Of  Platforms  In  One  vs.  One 


Table  5.2:  Data  For  Three  vs.  Three 


Bl 

B2 

B3 

R1 

R2 

R3 

Unit  categories 

bombers 

bombers 

grounds 

bombers 

interceptors 

grounds 

Initial  no.  of  platforms 

10 

10 

10 

10 

10 

10 

Initial  no.  of  weapons 

10 

10 

10 

10 

10 

10 

Initial  position 

(20,53) 

(20,50) 

(45,47) 

(80,53) 

(80,50) 

(55,47) 

Target  location 

(70,63) 

(80,52) 

(53,48) 

(30,63) 

(20,48) 

(43,46) 

5.3.2  Multi-units  Case 

We  also  report  on  one  of  five  3  vs.  3  experiments  as  an  example  for  the  multiple  units  case. 

Table  5.2  summarizes  the  pertinent  information  for  the  two  opposite  forces  in  that  specific  example 
and  Table  5.3  shows  the  weights  in  the  cost  function  in  that  example.  The  manner  of  engagement  in  that 
example  is:  B1  and  B2  are  allowed  to  attack  R1  and  B3  is  allowed  to  attack  R2.  R1  and  R2  are  allowed 
to  attack  B2  and  R3  is  allowed  to  attack  B3. 

Figures  5.6  -  5.10  show  respectively  the  initial  trajectories,  the  control  update  at  each  iteration,  the 
Nash  solution  of  trajectories  and  the  corresponding  firing  intensities  as  well  as  number  of  platforms. 

The  computational  time  required  to  reach  the  convergence  criterion  in  this  experiment  is  315.214 
seconds,  the  number  of  iterations  is  11. 

In  other  four  experiments,  the  computational  time  required  to  reach  the  convergence  criterion  was 
around  320  seconds  and  the  number  of  iterations  ranged  from  9  to  14. 
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Table  5.3:  Weights  In  Cost  Function  For  Three  vs.  Three 


Running  cost 

Terminal  cost 

B1 

B2 

B3  R1 

R2 

R3 

Bl 

B2 

B3 

R1 

R2  R3 

Distance  to  target 

0.05 

0.05 

0.05  0.05 

0.05 

0.05 

Velocity  command 

600 

600 

600  600 

600 

600 

Firing  intensities  command 

100 

100 

100  100 

100 

100 

No.  of  platforms 

0.05 

1 

0.5 

10 

0.5  0.05 

B1:bomber,B2:bomber,B3:ground,R1:bomber,R2:interceptor,R3:gfound 

100  r 


90 


80 
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Figure  5.6:  Initial  Trajectories  In  Three  vs.  Three 


104 


1  6 


Iteration  i 


Figure  5.7:  Control  Update  In  Three  vs.  Three 
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Figure  5.8:  Nash  Trajectories  In  Three  vs.  Three 
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Figure  5.9:  Nash  Firing  Intensities  In  Three  vs.  Three 
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Figure  5.10:  Nash  Number  Of  Platforms  In  Three  vs.  Three 

5.3.3  Multi-units  Case  And  Computational  Complexity 

Similar  experiments  were  performed  in  the  2  vs.  2  through  5  vs.  5  case.  For  a  fixed  number  of  units,  5 
different  experiments  have  been  done.  The  computational  time  as  well  as  the  number  of  iterations  were 
recorded  in  each  experiment.  Figures  5.11  and  5.12  show  how  the  computational  time  and  number  of 
iterations  required  to  reach  a  convergence  criterion  change  as  the  number  of  units  is  increased  respectively. 

While  the  number  of  iterations  remains  close  to  constant  for  the  2  vs.  2  to  4  vs.  4  cases,  the  compu¬ 
tational  time  on  the  average  shows  a  quadratic  growth  in  these  cases.  The  computational  time  decreases 
for  the  5  vs.  5  case  because  the  5  vs.  5  engagement  is  different  and  simpler  than  the  one  which  was  used 
for  the  other  cases  which  all  followed  a  similar  pattern.  Clearly  the  computational  time  will  depend  on 
the  degree  of  interaction  and  the  involvement  of  the  units  in  battle  and  the  case  of  5  vs.  5  verifies  an 
expected  decrease  in  the  computation  time  for  a  simpler  scenario. 
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Computational  Time  Changes  as  the  Number  of  Units  is  Increased  from  2  to  10 


Number  of  Units 


Figure  5.11:  The  Computational  Time  Changes  As  The  Number  Of  Units  Is  Increased 
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Figure  5.12:  The  Number  Of  Iterations  Changes  As  The  Number  Of  Units  Is  Increased 
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Computational  Time  Changes  as  the  Mission  Duration  is  Increased  from  10  min.  to  50  min. 


Figure  5.13:  The  Computational  Time  Changes  As  The  Mission  Duration  Is  Increased 


5.4  Experiment  5.2:  The  Mission  Duration  Is  Increased  While 
The  Number  Of  Units  Is  Kept  Constant 

In  this  set  of  experiments,  the  number  of  units  is  taken  fixed,  as  2.  For  each  experiment,  the  mission 
duration  varies  from  10  to  50  minutes.  For  the  convergence  test  the  control  change  is  set  to  0.01  and 
step  size  is  0.5.  Figure  5.13  records  the  changes  in  the  computational  time  as  the  mission  duration  is 
increased.  It  roughly  increases  linearly  with  the  duration  time. 


5.5  Conclusions 

The  computational  time  required  to  reach  the  convergence  criterion  depends  on  many  factors,  such  as 
the  units  categories,  the  number  of  units,  initial  trajectories,  weights  in  the  cost  function,  step  size  in 
our  numerical  procedure  and  the  manner  of  engagements  as  well  as  initial  positions  and  target  locations. 
Similarly  the  number  of  iterations  required  to  reach  a  convergence  criterion  depends  on  the  same  factors. 
From  our  experimental  results,  a  major  factor  which  affects  the  computational  time  is  the  number  of 
units  and  mission  duration.  For  our  experiments  the  computational  time  of  the  controller  increased 
quadratically  as  a  function  of  the  number  of  units  and  linearly  as  a  function  of  the  mission  duration, 
while  the  number  of  iterations  itself  remained  relatively  constant  as  a  function  of  the  number  of  units. 
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Chapter  6 


Experiment  6:  Controller  with  a 
Kalman  Filter  for  Estimation 

6.1  Executive  Summary 

In  this  chapter,  we  present  an  algorithm  based  on  the  Extended  Kalman  Filter  (EKF)  for  state  estimation 
when  enemy  inputs  are  unavailable.  We  show  the  overall  structure  of  the  estimation  scheme  through  a 
block  diagram.  We  present  the  implementation  of  the  algorithm  for  the  air  operation  theater  through  a 
flowchart.  We  also  present  the  results  of  simulation  experiments. 


6.2  Purpose  of  the  Experiment 


The  purpose  of  the  experiment  is  to  show  that  the  current  differential  game  technology,  combined  with 
an  extended  Kalman  filter  provides  an  effective  means  of  countering  the  enemy  actions  under  idealized 
situations  with  perfect  information  about  enemy  initial  conditions  and  objectives,  but  with  noisy  mea¬ 
surements  of  a  subset  of  the  enemy  state. 

Description:  Both  the  plant  and  internal  models  are  the  same,  i.e.,  the  MDCM  (Mission  Dynamics 
Continuous  Model).  Increasing  levels  of  noise  will  be  added  to  the  state  variables  when  constructing 
the  observed  state  variables  (the  output  variables).  Some  of  the  enemy  state  variables  (weapons  per 
platform  first,  and  number  of  adversary  platforms  next)  will  be  removed  from  the  set  of  output  variables 
thus  making  them  unobservable.  The  control  actions  of  the  Blue  and  Red  teams  are  generated  by  the 
proposed  game  theoretic  algorithm. 


6.3  Hypothesis  to  Prove  or  Disprove 

The  current  differential  game  technology,  combined  with  an  extended  Kalman  filter  provides  an  effective 
means  of  countering  the  enemy  actions  under  idealized  situations  with  perfect  information  about  enemy 
initial  conditions  and  objectives,  but  with  noisy  measurements  of  a  subset  of  the  enemy  state.  The 
algorithm  based  on  EKF  adequately  estimates  the  unknown  red  state  in  the  presence  of  process  and 
observation  noise. 


6.4  Experiment  Setup  and  Experiment  Design 

In  this  report,  we  present  an  algorithm  based  on  the  Extended  Kalman  Filter  (EKF)  for  state  estimation 
when  enemy  inputs  are  unavailable.  We  show  the  overall  structure  of  the  estimation  scheme  through  a 
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block  diagram.  We  present  the  implementation  of  the  algorithm  for  the  air  operation  theatre  through  a 
flowchart.  We  also  present  simulation  results.  The  theoretical  description  of  the  EKF  is  given  in  [1]  and 


\2).  The  block  diagram  of  the  approach  is  shown  in  Fig.  6.1. 

The  inputs  to  the  filter  are  the  output  vector  yk  of  the  plant  and  the  input  vector  for  the  friendly 
unit,  the  outputs  of  the  filter  are  estimates  and  of  the  state  vector  Xk  and  enemy  input  vector 


u 


R 
k  * 
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Extended  Kalman  filter  for  estimating  state  xk  and  enemy  input  uk 


Figure  6.1:  Block  diagram  of  the  Extended  Kalman  Filter 


The  flowchart  of  algorithm  is  given  in  Fig.  6.2. 
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The  EKF  has  been  combined  with  the  Simulink  game  theoretic  controller  scheme  to  test  the  filter  as 
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Figure  6.2:  The  flowchart  of  the  estimation  algorithm 


a  closed- loop  device  as  in  Fig.  6.3. 
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Figure  6.3:  Closed- loop  EKF  combined  with  the  game  theoretic  controller 
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6.5  Experiment  Results  and  Analysis 

I-  NO  NOISE 
v=0 

Blue  states,  and  red  states  (observed  (solid)  and  estimated  (dotted))  are  presented  in  Fig.  6.4. 
respectively, 
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Figure  6.4:  Blue  states,  and  red  states  (observed  (solid)  and  estimated  (dotted)),  no  noise. 

II-  MAXIMUM  SENSOR  NOISE 

Maximum  size  of  gaussian  sensor  noises  is  considered.  The  maximum  size  random  noise  is  1%  of  the 
operating  point  for  the  blue  states,  and  5%  of  the  operating  point  for  the  red  states.  Blue  states,  and  red 
states  (observed  (solid)  and  estimated  (dotted))  are  presented  in  Fig.  6.5.  respectively, 
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Figure  6.5:  Blue  states,  and  red  states  (observed  (solid)  and  estimated  (dotted)),  maximum  noise. 


For  a  sample  game,  the  results  are  given  below.  The  enemy  control  input  is  given  manually.  The  goal 
in  the  game  is  as  follows; 

The  blue  interceptors  try  to  kill  as  many  red  bombers  as  possible  and  to  reach  the  target,  and  The 
red  bombers  try  to  preserve  their  own  platforms  and  to  reach  the  target. 
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Figure  6.6:  Trajectories  of  units 


6.6  Conclusions  and  Recommendations 

The  EKF  algorithm  is  capable  of  estimating  the  states  in  the  presence  of  process  noise  as  well  as  sensor 
noise  in  different  size.  The  estimates  of  the  enemy  inputs  are  too  noisy  to  be  useful  in  themselves.  Luckily, 
we  only  need  the  states. 

The  Mat  lab  code  for  the  EKF  has  also  been  combined  with  the  Simulink  block  for  the  game  theoretic 
controller. 

The  controller  scheme  uses  the  state  estimates  rather  than  observed  states. 

The  scheme  has  successfully  been  implemented  for  both  open-loop  case  and  closed-loop  case. 

As  r)R  is  not  observed,  a  small  error  in  the  estimation  may  occur.  The  estimated  enemy  inputs 
(velocities  and  firing  intensity)  are  only  used  for  the  state  estimates  not  for  feedback.  Therefore,  the 
fluctuations  in  the  input  estimations  do  not  cause  much  error  in  the  estimated  states. 
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Chapter  7 


Experiment  7:  Controller  Applied  to 
a  More  Realistic  Plant 

7.1  Executive  Summary 

The  purpose  of  this  experiment  is  to  observe  the  effect  of  the  discrepancy  between  internal  and  plant 
models  in  a  closed-loop  setting.  The  internal  model  is  a  reduced-order  ODE  model,  called  the  Mis¬ 
sion  Dynamics  Continuous- time  Model  (MDCM  3.0),  and  the  plant  model  is  a  full  order  ODE  model, 
abbreviated  as  EPMDM,  which  exactly  describes  the  evolution  of  expected  values  in  PMDM. 

The  hypothesis  is  that  the  current  differential  game  technology  would  provide  an  effective  means 
of  countering  the  enemy  actions,  who  may  be  either  following  the  Nash  solution  or  using  some  simple 
heuristic  strategy,  when  noise- free  state  measurements  are  available,  in  spite  of  the  mismatch  between 
the  plant  and  the  internal  models. 

It  is  concluded  that  approximating  the  plant  model  with  a  lower  order  internal  model  does  not  cause  a 
significant  difference  in  game  results,  as  long  as  the  engagement  terminates  before  one  side  is  completely 
wiped-off. 


7.2  Purpose  of  the  Experiment 

The  purpose  of  this  experiment  is  to  observe  the  effect  of  the  discrepancy  between  internal  and  plant 
models  in  a  closed-loop  setting.  The  internal  model  is  a  reduced-order  ODE  model,  called  the  Mission 
Dynamics  Continuous-time  Model  (MDCM  3.0)  (see  Chapter  1  and  its  appendix),  and  the  plant  model  is 
a  full  order  ODE  model,  abbreviated  as  the  EPMDM,  which  exactly  describes  the  evolution  of  expected 
values  in  the  PMDM  [2]  (see  also  Chapter  1). 

In  the  Experiment  Plan  [1]  the  hypothesis  for  Experiment  7  is  stated  as: 


The  current  differential  game  technology  provides  an  effective  means  of  countering  enemy 
actions  under  more  realistic  situations  with  perfect  information  about  the  enemy. 

To  understand  this  statement,  we  remark  that  the  current  controller  for  our  system  is  based  on  an 
approximation  of  the  actual  model,  as  described  in  [2].  When  the  dynamics  of  the  plant  are  the  same  as 
the  internal  model  used  to  compute  the  Nash  solution,  we  know  that  the  current  controller  is  effective. 
We  now  test  if  this  controller  is  still  effective  when  the  plant  dynamics  are  more  realistic,  while  the 
controller  is  based  on  approximated  dynamics. 
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7.3  Hypothesis  to  Prove  or  Disprove 

The  hypothesis  is  that  the  current  differential  game  technology  would  provide  an  effective  means  of 
countering  the  enemy  actions,  who  may  be  either  following  the  Nash  solution  or  using  some  simple 
heuristic  strategy,  when  noise-free  state  measurements  are  available,  in  spite  of  the  mismatch  between 
the  plant  and  the  internal  models. 


7.4  Experiment  Setup 

To  implement  more  realistic  game  dynamics,  we  use  the  Probabilistic  Mission  Dynamics  Model  (PMDM) 
for  uncoordinated  target  selection  as  derived  in  [2].  In  [2],  the  model  is  derived  only  for  combat  between 
two  opposing  units,  therefore  our  experiments  will  only  be  for  two  opposing  units  (Blue  and  Red).  First 
we  summarize  the  PMDM. 

The  number  of  platforms  depend  on  chance  occurrences  and  are  defined  by  the  random  variables 

XB(t)  number  of  platforms  in  the  Blue  unit  at  time  £, 

XR(t)  number  of  platforms  in  the  Red  unit  at  time  t. 

The  initial  values  are  known  to  be  AB(0)  =  NB ,  XR(0)  =  NR.  At  any  given  time,  the  commanders  (or 
controllers)  can  observe  only  the  expected  values,  denoted  by 

r]B{t)  =  E[XB(t)}  and  »jR(t)  d=  E[XR(t)}. 

Then  by  using  the  standard  notation 

n».m(t)  =  P{XB(t)  =  n,XR(t)  =  rn) 

for  a  Markov  Process,  the  evolution  of  state  probabilities  are  described  by  the  differential  equation 

n(t)  -  n  (t)Q(t), 

where  we  stack  all  components  of  Unjn  into  a  vector  II,  and  Q  is  called  the  transition  rate  matrix  of  the 
process.  Considering  just  the  Red  unit  firing  on  the  Blue  unit,  the  loss  rate  of  Blue  platforms  is  defined 
by 

XB(t)  =f  pRPR(j)(\\(tB  — 

where  p  is  the  acquisition  rate,  Pk  is  the  probability  of  kill,  0  is  a  function  depending  on  the  distance 
between  the  units  and  7r  is  the  fire  intensity.  Then  the  more  realistic  plant  model,  derived  in  detail  in  [2], 
is 


nr 

=  -A V  +  AB  mno.-’  (7.1) 

m=  1 
Nb 

=  -\Rr)B  +\RJ2nUn,o-  (7.2) 

m=  1 

We  call  (7.1)  and  (7.2)  as  the  evolution  of  the  expected  values  in  the  PMDM  (EPMDM).  The  EPMDM 
is  the  plant  model  and  is  implemented  into  the  game  technology  software  as  shown  in  Figure  7.1,  which 
is  the  Simulink  file  that  simulates  the  game  dynamics.  The  only  change  in  the  software  from  the  MDCM 
to  the  EPMDM  is  the  grey  box  in  Figure  7.1  labeled  “Markov  Chain  Model”.  This  box  calculates  the 
summation  terms  in  (7.1)  and  (7.2). 


dt  1 
d 

—  7 

dt 
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bluecontmode 


Whenever  the  game  needs  to  recompute  the  Nash  solution,  we  use  the  approximated  dynamics  in 
MDCM  3.0,  in  Appendix  L12 


s"a  - 

(7.3) 

J,”-  -  -*v. 

(7.4) 

That  is,  we  just  drop  the  summation  terms  in  (7.1)  and  (7.2).  The  summation  terms  correspond  to 
having  zero  surviving  platforms  at  a  certain  time  t.  This  probability,  for  either  side,  will  be  small  at  the 
beginning  of  the  engagement,  but  it  may  grow  later.  We  call  (7.3)  and  (7.4)  the  internal  model,  which  is 
just  the  MDCM. 

7.5  Experiment  Results 

To  test  the  hypothesis,  we  must  compare  the  game  solutions  when  the  plant  model  is  the  EPMDM  and 
the  MDCM,  based  on  the  cost  components  and  the  total  game  cost.  If  the  costs  do  not  differ  depending 
on  the  plant  model  we  use,  then  the  hypothesis  will  be  verified. 

By  hypothesis  we  do  not  need  to  add  noise  to  the  state  variables  when  constructing  observed  state 
variables,  so  our  experiments  rely  on  perfect  information.  To  perform  the  experiments,  the  Blue  unit 
always  uses  the  game  theoretic  algorithm  (based  on  the  MDCM),  and  the  Red  unit  uses  the  following 
strategies: 

Strategy  A:  following  the  Nash  solution  (based  on  the  MDCM), 

Strategy  B:  a  simple  heuristic  deterministic  strategy. 
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The  heuristic  strategy  is  one  for  which  the  Red  unit  takes  an  assigned  path  and  has  an  assigned  time  at 
which  to  fire  weapons,  and  Red  maintains  this  course  of  action  no  matter  what  Blue  does. 

We  also  use  two  different  scenarios,  the  cross  scenario  and  the  joust  scenario.  The  cross  and  joust 
scenarios  are  summarized  in  Table  7.1.  The  weights  are  for  the  quadratic  cost  function  for  the  nonlinear 
game,  as  described  in  [3]. 


Table  7.1:  Scenario  Description 


cross 

joust 

Blue 

Red 

Blue 

Red 

Number  of  Units 

1 

1 

1 

1 

Number  of  Platforms 

10 

10 

10 

10 

pk 

0.8 

0.8 

0.8 

0.8 

P 

0.5 

0.5 

0.5 

0.5 

CHOXkm) 

20.0 

50.0 

20.0 

80.0 

e2)(0)(km) 

50.0 

80.0 

50.0 

52.0 

Weight:  Distance  to  Target  Cost 

0.1 

0.1 

0.1 

0.1 

Weight:  Running  Platform  Cost 

0.01 

3.0 

0.2 

20.0 

Weight:  Speed  Cost 

200.0 

200.0 

200.0 

200.0 

Weight:  Terminal  Platform  Cost 

0.0 

0.0 

0.0 

0.0 

Weight:  Terminal  Target  Cost 

0.0 

0.0 

0.0 

0.0 

Weight:  Terminal  Speed  Cost 

0.0 

0.0 

0.0 

0.0 

The  figures  are  organized  as  follows:  Figures  7. 2-7. 7  are  for  Strategy  A  using  the  cross  scenario, 
Figures  7.8-7.13  are  for  Strategy  B  using  the  cross  scenario,  Figures  7.14-7.19  are  for  Strategy  A  using 
the  joust  scenario  and  last,  Figures  7.20-7.25  are  for  Strategy  B  using  the  joust  scenario.  In  these  four 
experiments,  the  figures  compare  the  trajectories  computed  by  the  game  when  the  plant  model  is  EPMDM 
and  when  the  plant  model  is  MDCM.  They  also  compare  the  game  solution  for  number  of  platforms, 
speed  controls,  fire  intensities  and  weapons  expenditures.  Note  that  the  compared  trajectories  are  very 
close,  so  their  plots  are  indistinguishable  in  most  of  the  figures. 


7.6  Analysis 

Examining  the  figures  we  can  see  that  the  game  solutions  do  not  appear  to  differ  significantly.  This  result 
is  not  surprising  since  the  number  of  platforms  on  either  side  decreases  only  slightly,  and  we  know  that 
MDCM  differs  noticeably  from  EPMDM  only  when  one  side  approaches  to  zero  platforms  (as  mentioned 
above  and  in  [2]). 

To  compare  the  game  solutions  with  different  plant  models,  we  have  decided  to  compare  the  cost 
components  of  the  quadratic  cost  function  and  the  actual  game  costs.  Tables  (7.2)  and  (7.3)  summarize 
these  costs  for  the  experiments  using  the  cross  scenario  and  Tables  (7.4)  and  (7.5)  summarize  these  costs 
for  the  experiments  the  joust  scenario.  Omitted  from  the  tables  is  the  terminal  cost  components  which 
are  zero.  We  can  clearly  see  with  these  results  that  the  difference  between  using  the  EPMDM  and  MDCM 
for  the  plant  is  insignificant. 


7.7  Conclusions  and  Recommendations 

With  perfect  measurement  and  under  closed-loop  control,  these  experiments  show  that  the  reduced-order 
model,  MDCM,  provides  an  effective  approximation  to  the  more  realistic  situation,  EPMDM.  However, 
we  remark  that  in  [2]  it  was  shown  that  as  one  side’s  platforms  go  to  zero,  the  approximation  may  not 
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Table  7.2.  Cost  Components  of  Objective  Function  Using  cross  Scenario 


Blue 

Plant  Model 

Red  Control 

Running  Platform  Cost  Running  Control  Cost 

EPMDM 

Game 

-8172.7 

1314.6 

MDCM 

Game 

-8172.6 

1316.5 

EPMDM 

Heuristic 

-6181.1 

1788.9 

MDCM 

Heuristic 

-6184.0 

1789.3 

Red 

Plant  Model 

Red  Control 

Running  Platform  Cost  Running  Control  Cost 

EPMDM 

Game 

-3164.7 

852.3 

MDCM 

Game 

-3164.5 

853.9 

EPMDM 

Heuristic 

-2088.1 

5344.0 

MDCM 

Heuristic 

-2094.0 

5344.0 

Table  7.3:  Total  Costs  of  Objective  Function  using  cross 

Scenario 

Plant  Model 

Red  Control 

Blue  Total  Cost 

Red  Total  Cost  Game  Cost  [ 

EPMDM 

Game 

-6858.1 

2321.4 

-4545.7 

MDCM 

Game 

-6586.1 

2310.6 

-4545.5 

EPMDM 

Heuristic 

-4392.2 

-3255.9 

-7648.1 

MDCM 

Heuristic 

-4394.7 

-3250.0 

-7644.7 

Table  7.4:  Cost  Components  of  Objective  Function  Using  joust  Scenario 

Blue 

Plant  Model 

Red  Control 

Running  Platform  Cost  Running  Control  Cost 

EPMDM 

Game 

-7703.0 

2117.8 

MDCM 

Game 

-7698.9 

2123.3 

EPMDM 

Heuristic 

-7141.5 

2788.0 

MDCM 

Heuristic 

-7145.5 

2787.9 

Red 

Plant  Model 

Red  Control 

Running  Platform  Cost  Running  Control  Cost 

EPMDM 

Game 

1793.3 

-2194.7 

MDCM 

Game 

1787.0 

-2198.6 

EPMDM 

Heuristic 

4192.8 

-5444.0 

MDCM 

Heuristic 

4192.7 

-5444.0 

Table  7.5:  Total  Costs  of  Objective  Function  Using  joust  Scenario 

Plant  Model 

Red  Control 

Blue  Total  Cost 

Red  Total  Cost  Game  Cost 

EPMDM 

Game 

-5585.2 

-401.4 

-5986.6 

MDCM 

Game 

-5575.6 

-411.6 

-5987.2 

EPMDM 

Heuristic 

-4353.5 

-1251.2 

-5604.7 

MDCM 

Heuristic 

-4353.6 

-1251.3 

-5604.9 
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be  as  good.  That  is,  if  we  were  to  have  scenarios  for  which  one  side  has  zero  surviving  platforms,  we 
may  find  that  the  approximation  model  MDCM  may  not  be  as  good.  Also,  the  experiments  show  that 
approximating  the  plant  model  with  a  lower  order  internal  model  does  not  cause  a  significant  difference 
in  game  results,  when  the  adversary  uses  either  a  game  theoretic  controller  or  a  heuristic  controller. 
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ross):  Number  of  Platforms 


7.12:  Strategy  B  (cross):  Speed  Control 


1:  Strategy  B  (cross):  Weapons  Expenditures 
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► ust ):  Number  of  Platforms 


B  (joust):  Fire  Intesity 


3  (joust):  Speed  Control 


mst):  Weapons  Expenditures 
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Chapter  8 


Experiment  8:  All  Quadratic 
Method  for  Nash  Computation 

8.1  Executive  Summary 

The  purpose  of  Experiment  8  is  to  develop,  implement  and  test  the  Sequential  Quadratic- Quadratic 
Method  (SQQM)  for  Differential  Games,  with  two  hypotheses  of  interest.  The  first  hypothesis  tests 
whether  the  Nash  solution  computed  through  the  Sequential  Quadratic-Quadratic  Method  is  identical 
to  the  one  found  using  the  Sequential  Linear-Quadratic  Algorithm  (SLQM);  the  second  hypothesis  tests 
whether  there  is  an  improvement  in  convergence  time.  The  issue  of  speed  can  become  of  great  importance 
in  real  time  applications;  moreover,  due  to  the  presence  of  nonlinearity  and  constraints,  a  different 
approach  serves  the  purpose  of  validating  previous  results.  The  algorithm  is  based  on  an  iterative  method 
for  computing  a  Nash  solution  to  a  zero-sum  differential  game  with  a  system  of  nonlinear  differential 
equations. 

Several  experiments  on  different  scenarios,  based  on  both  Model  2  and  Model  3,  have  shown  the 
convergence  of  the  outputs  of  the  SQQM  and  SLQM  algorithms  to  the  same  solution.  So  the  first 
hypothesis  of  Experiment  8  is  proven  true.  As  for  the  second  hypothesis,  namely  an  improvement  in 
convergence  speed,  the  conclusion  is  that  the  SQQM  alone  proves  to  be  fast  in  simple  scenarios,  if, 
however,  the  starting  trajectory  and  costate  estimates  are  too  far  from  the  optimal  solution,  the  SLQM 
may  be  used  at  first,  and  then  switch  to  the  SQQM  once  the  solution  estimate  is  closer  to  the  optimal 
solution.  In  more  complex  cases,  it  is  thus  advantageous  to  blend  the  linear-quadratic  algorithm  and 
the  quadratic-quadratic  algorithm,  taking  advantage  of  both  the  superior  stability  of  the  SLQM  and  the 
superior  speed  of  the  SQQM. 


8.2  Purpose  of  the  Experiment 

The  purpose  of  Experiment  8  is  to  develop,  implement  and  test  the  Sequential  Quadratic- Quadratic 
Method  (SQQM)  for  Differential  Games,  with  the  main  goal  of  exploring  the  possibility  of  reducing  the 
computational  time  with  respect  to  the  Sequential  Linear-Quadratic  Algorithm  (SLQM). 

The  algorithm  is  based  on  an  iterative  method  for  computing  a  Nash  solution  to  a  zero-sum  differential 
game  for  a  system  of  nonlinear  differential  equations.  Given  a  solution  estimate,  a  subproblem  is  defined, 
which  approximates  the  original  problem  around  the  previous  solution  estimate  with  a  quadratic  system 
dynamics;  then,  it  is  replaced  with  another  subproblem  which  has  a  quadratic  cost  and  a  linear  dynamics. 
Because  the  latter  subproblem  has  only  a  linear  dynamics,  a  Riccati  equation  method  can  be  applied 
to  compute  the  Nash  solution  to  the  subproblem.  By  adding  this  Nash  solution  to  the  current  solution 
estimate  for  the  original  game,  a  new  solution  estimate  is  obtained.  Repeating  this  process,  it  is  possible 
to  successively  generate  better  solution  estimates  that  converge  to  the  Nash  solution  of  the  original 
differential  game. 
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8.3  Hypotheses  to  Prove  or  Disprove 

There  are  two  hypotheses  of  interest:  the  first  one  tests  whether  the  Nash  solution  computed  through 
the  Sequential  Quadratic-Quadratic  Method  is  identical  to  the  one  found  using  the  Sequential  Linear- 
Quadratic  Algorithm;  the  second  one  tests  whether  there  is  an  improvement  in  convergence  time. 

The  issue  of  speed  can  become  of  great  importance  in  real  time  applications;  moreover,  due  to  the 
presence  of  nonlinearity  and  constraints,  a  different  approach  serves  the  purpose  of  validating  previous 
results.  The  introduction  of  quadratic  terms  in  the  approximation  of  the  plant  model  along  the  reference 
trajectory  is  taken  into  account  by  adjoining  it  to  the  cost  function  expression,  by  the  classical  use  of  an 
additional  costate.  On  one  side,  the  improved  approximation  reduces  the  number  of  iterations  required 
by  the  sequential  algorithm;  on  the  other  side,  each  iteration  requires  a  longer  time  to  be  completed  due 
to  the  increase  in  the  order  of  the  model  involved.  As  a  result,  the  Quadratic-Quadratic  Algorithm  may 
provide  a  better  performance  in  case  the  scenario  is  highly  nonlinear.  When  the  nonlinearities  do  not  play 
a  major  role,  the  Linear-Quadratic  Algorithm  may  perform  better;  this  happens,  for  example,  when  the 
initial  guess  of  the  costate  for  the  SQQM  is  far  from  the  optimal  one,  thus  requiring  extra  time  before  the 
quadratic  algorithm  can  actually  start  converging.  Therefore,  an  algorithm  has  also  been  implemented 
that  actually  blends  both  the  SQQM  and  SLQM  strategies:  after  starting  with  the  linear  algorithm,  a 
test  is  routinely  made  to  check  whether  the  quadratic  algorithm  may  take  over.  This  takeover  should 
happen  when  the  linear  algorithm  generates  a  solution  estimate  that  it  is  sufficiently  close  to  the  optimal 
one. 


8.4  Experiment  Setup 

In  this  section,  we  report  in  detail  the  mathematical  formulation  of  the  problem  at  hand  and  the  imple¬ 
mentation  of  the  Sequential  Quadratic-Quadratic  Method. 

8.4.1  Problem  and  Nash  Solutions 

Let  U  denote  the  set  of  Rm-valued  continuous  functions  on  [£0,  £/].  Consider  a  system  governed  by  the 
ordinary  differential  equation, 

-  f(x{t),u{t)),  t  G  [£o,*/];  x{to)  =  z0 ,  (8.1) 

where  /(x,u)  is  an  Revalued  C2-class  function  on  Rn  x  Rm.  Given  any  control  u  6  U  and  an  initial 
state  x(to)  =  z0,  it  is  assumed  that  equation  (8.1)  defines  a  unique  continuously  differentiable  solution 
x(t),t  €  [t0,tf).  The  solution  x  is  called  the  trajectory  of  the  system  produced  by  control  u  starting  from 
the  initial  state  z0  and  it  is  also  denoted  by  x[u]  €  X,  where  X  is  the  space  of  continuously  differentiable 
Revalued  functions  on  [£o,t/]- 

Consider  the  following  game  problem.  The  control  function  u  consists  of  two  parts,  uB  and 
corresponding  to  the  two  forces,  the  Blue  and  the  Red :  u  —  (uB  ,uR).  As  the  cost  function,  consider 

ftf 

J(u)  =  J{uB,uR)  =  /  g(x(t),u(t))dt-\-gf(tf,x(tf)),  (8.2) 

Jt0 

where  g  and  g /  are  general  nonquadratic  functions  and  are  C2-class  functions  on  [to,  £/].  It  is  sometimes 
convenient  to  consider  J(u)  as  a  function  of  both  u  and  x  with  an  additional  constraint  (8.1)  connecting 
u  and  x  =  x[w],  i.e.,  J(u)  =  J[x[u]yii\.  The  overall  game  is  expressed  as  the  following  minimax  problem: 

J*(to,zo)  :=  min  max  j  J (uB ,  uR)  I  ~;x(t)  —  f(x(t),u(t)),  x(t0)  =  z0\,  (8*3) 

uB  uR  1  1  dt  J 

where  the  Red  force  tries  to  maximize  the  cost  function  J(uB,uR)  and  the  Blue  force  tries  to  minimize 
the  same  cost  function  J(uB,uR). 
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The  control  function  u*  =  (u*B  ,u*R)  satisfying  the  inequalities 

J{u*B,vR)  <  J{u*B,u*R)  <  J{vB,u*R)  (8.4) 

for  any  v  —  (uB,  vR)  in  a  neighborhood  of  u*  is  called  a  (local)  Nash  solution  to  the  game  problem  (8.3). 
The  optimal  value  J*(to,zo)  =  «/(u*)  of  the  cost  function  J(uB ,uR)  is  called  the  value  of  the  game  and 
it  depends  on  the  initial  time  to  and  the  initial  state  z$. 

The  proposed  iterative  process  for  computing  a  Nash  solution  is  of  the  form 


U{- |-i  —  Ui  4  Oi{  SlLi 


(8.5) 


with  a  step  size  oti  6  (0, 1],  Here,  Siii  is  a  solution  to  the  4th  subproblem  which  is  obtained  by  applying 
quadratic  (or  linear)  approximations  to  g,gf  and  /  of  the  original  differential  game  (see  Section  3).  The 
following  simple  proposition  suggests  that  it  makes  sense  to  consider  the  iterative  process  of  the  form 
(8.5)  for  computing  a  Nash  solution.  Here,  we  consider  the  simplest  iterative  process  which  was  proposed 
in  [5]  and  [6].  The  proof  of  the  following  proposition  can  readily  be  obtained  by  checking  the  first  order 
necessary  conditions  for  Nash  equilibrium  (8.4).  In  the  following,  transposition  is  denoted  by  a  prime  and 
the  second-order  partial  derivatives  of  the  function  g{x,u)  by  gxx(x,  u),  gxu(x,  v),gux(x,  u),  and  guu{ tt). 

Proposition  1.  Suppose  that  the  control  u*  —  (u*B ,u*R)  is  a  Nash  solution  to  the  problem  (8.3).  Let 
x *  =  x[u*]  denote  its  state  trajectory.  Then  the  zero  solution  (daB,SuR)  —  (0,0)  is  a  solution  to  the 
subproblem: 


/  ftf  r 

mm  max  <  / 

6ub  8ur 


g(x*,u*)  -f  gx(x*,u*)&c  +  gn(x*,u*)du  +  ^Sx'gxx(x*,  u*)Sx 


+  '-Suf gux(x* ,u*)& r  4  ^&c'gxu(x*,  u*)du  4  ^duf guu(x* ,u*)du 


dt 


+  (9f)x  (**(*/))&(*/)  4  -& (tf)'  (gf)xx  (x*{tf))fa(tf) 


dt 


fa  =  fx(x*,u*)fa  4  fu(x*,u*)8u ,  &(t0) 


»}■ 


(8.6) 


where,  in  the  interest  of  brevity,  the  time  t  is  suppressed. 


Observe  that  the  cost  function  in  the  above  subproblem  (8.6)  is  the  quadratic  approximation  of  the 
original  cost  function  in  (8.3)  around  the  Nash  solution  (x*,u*)  and  that  the  linear  differential  equation 
in  the  above  subproblem  (8.6)  is  the  linear  approximation  of  the  original  differential  equation  in  (8.3) 
around  the  Nash  solution  (:r*,ii*).  We  now  define  the  Hamiltonian  H  for  the  differential  game  (8.3): 


H{x{t),  u(t),  A (<))  =  g{x{t), u(t))  +  \{t)'f{x(t),u{t)),  (8.7) 

where  A  G  X. 

Here,  it  is  assumed  for  all  time  t 

guBufi(x(t),uB(t),uR(t))  =  0  and  guRuni(x{t),uB (t),uR(t))  =  0  (8.8) 

and 

fuBuR(x(t),uB(t,),uR{t))  =  0  and  fuRuB(x(t),uB (t),uR(t))  =  Q.  (8.9) 

Roughly  speaking,  assumption  (8.8)  states  that  there  are  no  cross  product  terms  between  Blue  and  Red 
controls  in  the  cost.  Similarly,  assumption  (8.9)  states  that  there  are  no  cross  product  terms  between 
Blue  and  Red  controls  in  the  righthand  side  of  the  differential  equations.  With  these  assumptions,  the 
following  conditions  hold  for  any  time  t: 

HuBuR(x*(t),u*B{t),u*R(t),\*(t))=  0  and  HunuB{x*{t),u*B{t),u*R{t),  A*(t))  =  0.  (8.10) 
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8.4.2  Sequential  Quadratic-Quadratic  Method 

It  is  assumed  that  a  solution  estimate  m  =  (uf  ,uf)  is  available  and  we  successively  improve  it.  Let  us 
introduce  some  notations.  Let  xt  =  x[ui]  be  the  trajectory  of  (8.1)  corresponding  to  u*  with  Xi(t0)  =  20  ■ 
Let  =  (&zB,  be  a  small  perturbation  of  u  and  x[ui  +  fo]  be  the  trajectory  corresponding  to  control 
Ui  +  Su  with  initial  condition  x[ui  +  &i](£o)  =  zo * 

In  [5]  and  [6],  we  proposed  an  iterative  process,  whose  z-th  step  consists  of  solving  the  following 
subproblem,  in  which  the  original  differential  equation  (8.1)  is  linearized  around  the  z-th  solution  estimate 
(ui,Xi)  and  the  original  cost  function  J  is  approximated  by  a  quadratic  function  around  the  z-th  solution 
estimate  (iti,Xi): 


mm  max 
8llb  Sur  [Jto 


j  g{xi,  +  gx(xi,  Ui)Sx  +  gu{xuUi)Su  -f-  gxx{xi , u<)& 


+  gux(xilui)Sx  +  gXu(xi>Ui)6u  +  -Su'  guu(xi,Ui)5u 


dt 


+gf(xi(tf))  4  (gf)x  (Xi(^/))^r(^/)  4  (5/)xx 

d 


dt 


Sx  =  /x(oq,zq)fo  +  /„(xi,Ui)&i,  &(i0)  =  0  >  ,  (8.11) 


Here,  in  order  to  obtain  faster  convergence  than  the  iterative  process  based  on  this  Zznear-quadratic 
approximation,  we  consider  the  following  ^zzadraizc-quadratic  approximation  to  the  original  problem. 
Namely,  at  the  z-th  step,  all  the  functions  including  the  differential  equations  are  approximated  by 
quadratic  functions  around  the  z-th  solution  estimate  (tq,Xi)  with  Xi  =  x[ui): 


,tf 

min  max  ' 

5ub  Sur  [Jto  L 


.1/ 


g(xi,Ui)  +  gx(xi,Ui)6x  +  gu(xi,iii)5u  +  -5x'gxx(xi,Ui)5x 


+  ^foi'9ux{xi,Ui)dx+  1 fc'gxu{xi,Ui)5u+  ^5u' guu{xi,Ui)5u 


dt 


+9f(xiitf))  +  ( 9f)x  +  r&( tfY  ( 9;)xx  (Xi{tf))Sx{tf) 


=  fWx(xi,Ui)ac  +  *)[&,&],  J  =  1,2,  •  •  •  ,n,  <5r(<o)  =  0  V,  (8.12) 

dt  * 


where  the  second-order  terms  of  the  j-th  right-hand  side  are  collected  as 
(%uUi)[Sx,5u]  := 

Ui)&c  +  5u' fij) ux{xit  Ui)Sx  +  &' /(j) Xu{xit  Ui)6u  +  6u' uu(xu  ujdu. 

For  the  sake  of  notational  convenience,  the  set  of  n  scalar  ordinary  differential  equations  in  (8.12)  will  be 
compactly  denoted  by  the  vector  differential  equation: 

^-Sx  -  fx(xi,Ui)5x  +  fu{xi,Ui)du+]-F(xi;Ui)(5x,Su),  6x(t0)  =  0.  (8.13) 

dt  £ 

Although  the  quadratic  expansions  in  (8.12)  of  the  original  functions,  /,  g  and  #/,  around  the  z- 
th  solution  estimate  (ni,£;)  approximate  the  original  problem  (8.3)  better  than  the  linear-quadratic 
approximation  in  (8.11),  the  z-th  subproblem  (8.12)  obviously  is  not  a  linearly  constrained  quadratic 
problem.  Such  nonlinearly  constrained  problems  usually  can  not  be  solved  easily.  So  we  replace  the 
subproblem  (8.12)  by  a  linearly  constrained  quadratic  subproblem,  whose  solution  will  be  the  solution 
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to  the  quadratically  constrained  subproblem  (8.12)  in  the  limit  as  the  solution  estimates  converge  to  the 
Nash  solution  provided  a  certain  parameter  (costate  /i  G  X)  is  chosen  correctly.  We  thus  propose  the 
following  linearly  constrained  subproblem  for  a  given  triple  (ti?;,  Xi,  fa)  of  control,  state  and  costate  with 
Xi  =  x[ui],  where  fa  is  in  the  space  X  of  Kn-valued  CM-class  functions  on  [to,tf]: 


min  max 

5ub  SaR 


g{xi,Ui)  -h  gx(xi,Ui)6x  +  gu{xu  ux)da  +  -fc'gxx(xi,fa)dx 
-f  gux (xii  Ui)fix  -h  ~5r  gxu{xi ,  xii)6u  d-  ~ Su  gUu (^ ,  ux)6u, 


+  /0)  (.Xi  ,«<)[&;,  &i] 

Z  j=i 


dt 


+9f(xi{tf))  +  ( 9f)x  ( Xi(tf))5x{tf )  +  (gf)xx  (; Xi{tf))6x(tf ) 

X(xi,iii)5x  +  f^3\(xi,Ui)5u,  dx{t0)  =  o|. 


(8.14) 


Here,  denotes  the  j-th  element  of  an  n-dimensional  vector-valued  function  Hi  on  [to,t/].  In  the 
following,  the  last  term  in  the  integrand  of  (8.14)  will  be  compactly  expressed  as 


1 

2 


F(xi,  fa)(&c,  So). 


Observe  that  the  second-order  terms  of  the  quadratic  differential  equations  (8.13)  are  removed  from 
the  differential  equations  and  added  to  the  cost  function  after  taking  the  inner  product  with  the  costate 
fii(t).  Hence,  the  differential  equations  in  subproblem  (8.14)  are  linear.  Because  the  cost  function  was 
quadratic  in  (8.12),  this  addition  of  quadratic  terms  does  not  change  the  quadratic  nature  of  the  original 
cost  function  in  (8.12).  The  idea  of  moving  the  quadratic  term  of  the  constraints  to  the  quadratic  cost  was 
first  proposed  by  Wilson  [9]  for  optimization  in  a  finite-dimensional  context.  The  convergence  analysis 
of  the  resulting  iterative  method  is  carried  out  by  Robinson  [7]  and  [8]  in  a  finite  dimensional  context. 
However,  we  believe  that  we  are  the  first  to  apply  the  idea  to  an  infinite-dimensional  problem  in  a  function 
space. 

As  the  costate  fa  of  the  z-th  subproblem  of  (8.14),  it  is  quite  natural  to  choose  the  costate  fa  cor¬ 
responding  to  (tii,x*)  given  by  Pontryagin’s  maximum  principle  applied  to  subproblem  (8.11).  Indeed, 
following  Pont ry agin,  we  define  the  costate  fa  by 


,  TIL 

=  ~9x(xi(t),Ui(t))  -J2^3)(t)f{:l)x(xi{t),Ui(t)), 

j~  1 

Vi(tf)  =  ( 9f)x{xi{tf ))• 


(8.15) 

(8.16) 


After  finding  the  Nash  control  Sui  for  the  z-th  subproblem  (8.14),  we  update  the  current  control  fa  to 


fa+i (t)  =  fa{t)  +  fa  Sui(t ),  t  e  [to,tf],  (8.17) 

with  a  step  size  fa  €  (0, 1].  We  note  that  a  small  step  size  may  be  necessary  at  the  beginning  for  the  sake 
of  stability  and  that  the  step  size  of  one  is  recommended  towards  the  end  for  the  sake  of  fast  convergence. 
We  then  compute  its  trajectory 

xi+i(t)  =  x[ui+i](t)  (8.18) 

of  the  original  nonlinear  differential  equation  (8.1). 

We  then  update  the  costate  fa  to  fa+u  which  corresponds  to  the  updated  costate-state  pair  (fa+\,  £i+i) 
and  which  is  computed  from  (8.15)-(8.16)  with  z  replaced  by  i  -f  1.  Finally  we  can  use  fa+\  as  the  next 
costate  fa+i>  However,  in  order  to  stabilize  this  iterative  process,  we  propose  to  use 

=  Hi{t)  +  Pi  [i'i+iW  -  M;(0] .  (8.19) 
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with  a  step  size  fa  <E  (0, 1].  When  the  solution  estimate  (rq,x*,/z;)  is  far  from  the  local  (Nash)  optimum 
A*),  we  may  have  to  choose  relatively  small  fa  and  a*  in  order  to  keep  the  iterative  process  stable. 
However,  when  the  solution  estimate  (u*,  .74, /i*)  is  close  to  the  local  (Nash)  optimum  (u*,: r*,A*),  we 
recommend  the  choice  of  fa  =  1  and  a*  =  1  to  make  the  convergence  faster.  If,  instead  of  using  (8.19), 
we  choose  m  =  0  for  all  iterations  i,  then  our  iterative  process  (8.17)-(8.19)  reduces  to  the  Sequential 
Linear-Quadratic  Method  proposed  in  [5]  and  [6]. 

Note  that  both  subproblems  (8.14)  and  (8.12)  reflect  the  quadratic  approximation  of  the  original 
problem  (8.3).  However,  subproblem  (8.14)  can  be  solved  more  readily  (by  a  Riccati  equation  method) 
than  subproblem  (8.12),  because  subproblem  (8.14)  is  a  linearly  constrained  quadratic  problem  while 
(8.12)  is  a  quadratically  constrained  quadratic  problem.  We  expect  that  the  iterative  process  based  on 
subproblem  (8.14)  is  locally  convergent  to  a  local  Nash  solution  and  that  the  convergence  is  fast,  because 
it  is  established  (Robinson  [7]  and  [8])  for  optimization  in  a  finite-dimensional  space  that  the  iterative 
process  based  on  the  same  idea  is  locally  convergent  to  a  local  minimum  and  that  the  rate  of  convergence 
is  quadratic. 


8.4.3  Riccati  Equation  Method 

The  iterative  method  described  in  the  previous  section  requires  a  technique  for  solving  a  linearly  con¬ 
strained  quadratic  game  in  order  to  solve  subproblem  (8.14).  We  now  present  such  a  technique  in  this 
section.  The  technique  consists  of  solving  Riccati  differential  equations  backwards,  the  linear  differential 
equations  forwards  and  the  linear  adjoint  differential  equations  backwards.  This  technique  is  known  as 
the  Riccati  equation  method. 

For  the  game  problem  (8.3),  the  dynamic  programming  approach  requires  the  value  function ,  which 
is  defined  by 


J*(t,z)  =  min  max  (  [  3(x(r),«(r),r)dr  +  gf{tf,x(tf)) 

uB  uR  Ut 

Jtx(T)  =  /(z(t),u(t),t),  r  €  [M/]>  x(t)  =  z  j  ,  (8.20) 

for  t  €  [£o>M  and  The  value  function  J*(t,z)  satisfies  the  following  boundary  condition  at  time 

*/: 

z)  =  z)  for  any  z  E  Mn.  (8.21) 

Under  the  assumption  of  continuous  differentiability,  a  direct  application  of  the  principle  of  optimality 
to  (8.20)  yields  the  so-called  Hamilton-Jacobi-Isaacs  (HJI)  equation, 

-Jt*{t,z)  -  min  max  [Jz*(t,  z)f(z,  u,  t)  +  g(z,u,t)) ,  (8.22) 

uB  uR 

which  takes  (8.21)  as  a  boundary  condition.  If  there  exists  a  function  J*(t,  z)  satisfying  (8.21)  and  (8.22), 
then  the  HJI  equation  provides  a  means  of  obtaining  a  Nash  solution. 

We  now  consider  the  following  affine-quadratic  game: 

min  max  <  J[x\uB ,uR] 
uB  uR  1  ! 

~x(t)  =  A(t)x(t)  +  BB(t)uE(t)  +  BR(t)uR(t)  4-  c(i),  x(t0)  =  z0  j, 
at  > 

(8.23) 
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where 


'  Q(t) 

NB{t) 

NR{t)  ' 

NB{t)' 

RB{t) 

0 

,  NR(t)' 

0 

RR{t)  , 

+ 


dt  +  ^x(tf)'Qfx(tf)  +  x(tf)'rf. 


(8.24) 


Here,  we  may  suppose  that  the  square  matrices  Q(t),  RB(t),  RR(t)  and  Qf  are  symmetric.  Note  that  the 
two  control  cross  product  blocks  of  the  quadratic  form  in  the  integrand  of  the  cost  function  (8.24)  are 
identically  zero  since  we  assumed  (8.8)  in  order  to  simplify  the  minimax  problem  in  the  Hamilton- Jacobi- 
Isaacs  (HJI)  equation  (8.22). 

Since  one  can  expect  that  simple  arguments  in  Anderson  and  Moore  [1]  also  work  for  min-max 
problems,  we  may  assume  the  value  function  J*(£,  z)  is  quadratic  in  z.  Thus,  we  assume  that  the  value 
function  takes  the  following  form: 


J"(t,  z)  =  l-z'S{t)z  +  k{t)'z  +  m(t),  (8.25) 

where  k(t)  G  Rn,m(£)  €  R  and  S(t)  G  Rnxn  is  a  symmetric  matrix.  We  may  now  solve  the  HJI  equation 
(8.22)  explicitly  (see  Basar  and  Olsder  [2]). 

Lemma  2.  (Riccati  equations)  The  Hamilton-Jacobi-Isaacs  equation  (8.22)  for  the  linear-quadratic  prob¬ 
lem  (8.23)  has  a  solution  J*(t,z)  of  the  form  (8.25)  on  [t0,tf]  x  Mn  if  the  following  system  of  Riccati 
equations  have  a  solution  (S',  fc,  m) : 

ftS(t)  +  S(t)A(t)  +  A(t)'S(t)  -  { S(t)BB(t )  +  NB(t)}RB-\t){BB(t)'S(t)  +  NB(t)'} 


—  {S(t)BR(t)  +  NR{t)}RR~\t){BR{t)'  S(t)  +  NR{t)'}  +  Q(t)  =  0,  (8.26) 

jk{t)  +  A(t)'k(t)  -  {S(t)BB(t)  +  NB(t)}RB~\t){BB(t)'k(t)  +  rB(t)} 

~{S(t)BR(t)  +  NR(t)}RR~\t){BR(t)'k(t)  +  rft(t)}  +  S(t)c(t)  +  d(t)  =  0,  (8.27) 

ftm(t)-1-{k(t)'BB(t)  +  rB(t)'}RB~1(t){BB(t),k(t)  +  rB(t)} 

-±{k(t)'BR(t)  +  rR(t)'}RR~\t){BR(t)'k(t)  +  rR(t)}  +  fc(t)'c(t)  =  0,  (8.28) 

with  the  terminal  conditions, 

S(tf)  =  Qf,  k(tf)=rf ,  m(tf)  =  0.  (8.29) 


We  can  obtain  the  following  explicit  formula  for  the  Nash  control  in  a  state  feedback  form. 

Proposition  3.  Suppose  that  a  solution  (S,  fc,m)  to  the  equations  (8.26)-(8.28)  with  (8.29)  exists  on  all 
of  [to,  t/].  Then  a  Nash  solution  u*  to  the  linear-quadratic  differential  game  (8.23)  is  found  from 


u,B(t)  =  KB(t)x*{t)  +  cB(t) 

=  -RB~\t)  [{BB(t)'s(t)  +  NB(t)'}x*{t)  +  BB(t)'k{t)  +  rB(t)~\  ,  (8.30) 

u*R(t)  =  KR(t)x*{t)  +  cR(t) 

=  -RR~\t)  [{BR(t)'S(t)  +  NR{t)'}x’{t)  +  BR(t)'kit)  +  rR{t)}  ,  (8.31) 
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and  the  corresponding  value  is  given  by 


J[x*-u*}  =  J*(to,zo)  =  ^o5(t0)2o  +  k(t0)'z0  +  m{t0 ),  (8.32) 

where  x*  —  x[u *]  is  the  state  trajectory  driven  by  the  Nash  control  u*. 

The  above  proposition  states  that,  just  like  the  well  known  standard  form  of  the  Riccati  equation 
method  for  the  regulator  problem,  we  may  compute  the  Nash  solution  u*  by  the  following  procedure.  By 
substituting  (8.30)  and  (8.31)  into  the  linear  ordinary  equation  in  (8.23),  we  obtain 

jx(t)  =  A(t)  -  BB(t)RB-\t){BB(t)'S(t)  +  NB(t)'} 

-  BR(t)RR~\t){BR(t)'S(t)  +  NR{t)'}  x(t) 

+  [-BB{t)RB~\t){BB{t)'k{t)+rB{t)} 

-  BR(t)RR~1  (t){BB (t)' k(t)  +  rB(t)}  +  c(£)j  ,  x(t0)  =  z0,  (8.33) 

and  can  compute  its  solution  x*  from  it.  Finally,  we  can  compute  the  optimal  control  u*  from  x*  by 
(8.30)  and  (8.31): 

u*B{t)  =  -RB~\t)\L{BB(t)’S(t)  +  NB{t)'}x*{t)  +  BB{t)'k(t)+rB(t)'\,  (8.34) 

u*R{t)  =  -RR~\t)  [{Bfl(f)'S(t)  +  NH(£)'}a:*(£)  +  BR(t)'k(t )  +  rfi(i)]  .  (8.35) 

8.4.4  SQQM  Iterative  Algorithm  for  Game  Solution 

In  this  section,  for  nonlinear-quadratic  games,  we  present  an  iterative  algorithm  which  implements  the 
Sequential  Quadratic-Quadratic  Method  (SQQM)  discussed  in  Section  2  with  linear-quadratic  solution 
based  on  the  Riccati  equation  method  in  Section  3.  Thus,  we  assume  that  the  cost  function  J(u)  has  the 
form  given  in  (8.24),  where  the  block  matrices  and  component  vectors  given  in  the  original  cost  function 
(8.24)  are  denoted  by  barred  notations,  e.g.,  respectively  by  Q(£),  RB(t),  RR(t ),  NB(t)y  NR(t ),  d(£),  rB(t) 
and  rR(t)  to  distinguish  them  from  the  various  matrix- valued  functions  for  the  Riccati  equations. 


Sequential  Quadratic-Quadratic  Method 

Step  0:  Select  a  stopping  criterion  e  >  0,  and  an  initial  control-trajectory-costate  triple  (u0,  £0,  /i o) 
with  x0  =  x[uq ],  where  /i0  =  0  or  is  determined  by 

d  m 

— Po(f)  =  -9x{x0(t),u0{t))  -  ]Tmo  \t)f<'i)x(xo(t),u0(t)),  no (tf)  =  (gf)x(xo(tf)). 

(8.36) 

Set  the  counter  2  =  0. 
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Step  1:  Set  the  matrix- valued  functions  as  follows: 


A(t)  =  fx(xi{t),Ui{t)),  BB{t)  =  fuB(xi{t),Ui(t)),  (8.37) 

BR(t)  =  fun{xi(t),Ui(t)),  c(t)  =  0,  (8.38) 

d(t)  =  d(t)  +  Q(t)xi(t)  +  NB{t}uiB(t)  +  NR(t)xnR{t),  (8.39) 

rB(t)  =  fB(t)  +  NB(t)x{(t)  +  RB(t)uiB(t),  (8.40) 

rR(t)  =  fR{t)  +  NR(t)Xi(t)  +  RR(t)iHR{t),  (8.41) 

Q{t)  =  Q(t)  +  fi i{t)'  fxx{xi(t),Ui(t )),  (8.42) 

RB(t)  =  RB(t)  +  /Xi(t)7»B«»(*i(t),«i(0).  (8-43) 

RR(t)  =  RR(t)  +  fuRun{x,i{t),Ui(t)),  (8.44) 

NB(t)  =  NB(t)  +  lM(t),fxuB(xl(t),ul(t)),  (8-45) 

NR(t)  =  NR(t)  +  fxuR{xi(t),Ui(t)).  (8.46) 


Step  2:  Solve  the  following  Riccati  equations: 

±S(t)  +  S(t)A(t)  +  A(tyS(t) 

~{S(t)BB(t)  +  NB{t)}RB~\t){BB{t)'S{t)  +  NB{t)'} 

~{S(t)BR{t)  +  NR{t)}RR~\t){BR{t)'S{t)  +  NR(t)'}  +  Q(t)  =  0,  (8.47) 

jtk(t)  +  A(t)'k{t)  -  {S{t)BB{t)  +  NB(t)}RB~\t){BB(t),k(t)  +  rB(t)} 
~{S(t)BR(t)  +  NR(t)}RR~\t){BR{t)'k{t)  4-  rR{t)} 

+S(t)c(t)  +  d(t)  =  0,  (8.48) 

-  l-{k(t)'BB(t)  +  rB(t)'}RB~\t){BB(t)'k(t)  +  rB(t)} 

-l{k(t)'BR(t)  +  rR(t)'}RR-\t){BR(t)'k(t)  +  rfi(t)} 

+k(t)'c(t)  =  0,  (8.49) 

backwards  from  the  terminal  conditions 

S{tf)  =  Qf,  k(tf)  =  f/  +  QfXi(tf), 

m(tf)  —  0 

(8.50) 

and  obtain  the  solution  (S(t),  k(t),  m(t)). 

Step  3:  Solve  the  linear  ordinary  differential  equation: 

=  A{t)  -  BB(t)RB~\t){BB(t)’S(t)  +  NB(t)'} 

-  BR(t)RR~\t){BR{t)'S(t)  +  NR(t)'}  &a(t) 

+  -  BB(t)RB~\t){BB(t)'k{t)  +  rB(t)} 

-  BR(t)RR  1  (t){BB (1)' k(t)  +  rB(t)}  +  c(t) j  , 

&Ci(t0)  =  0  (8.51) 

forwards  and  obtain  the  state  correction 
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Step  4:  Compute  the  control  correction  Sui(t)  from  the  state  correction  by 


&nB{t)  =  KB(t)&n(t)  +  cB{t) 

=  -RB~\t)  [{Bs(f)'S(f)  +  NB(t)'}&a(t)  +  BB{t)'k{t)  +  rB(*)j 
duiR(t)  =  KR{t)  dxi(t)  +  cR(t) 

=  -RR~l(t)  [{ BR(t)'S{t )  +  A^t)'}^)  +  BR(t)'k(t)  +  rfl(t)] 
Step  5:  Update  the  control  by 

Ut+l(t)  =  Ui(t )  f  Qt  5Ui(t) 


(8.52) 

(8.53) 


(8.54) 


with  a  step  size  a*  6  (0, 1]. 

Step  6:  Compute  xi+\  —  x[ui+i]  by  solving  the  original  differential  equation  forwards: 


d_t 

dt 


—  f  (*£i+l  (0 1  (^))  j  244-1  (^o)  Z0. 


(8.55) 


Step  7:  Compute  the  new  costate  i/*+ 1  by  solving  the  adjoint  differential  equation: 

,U) 


dt 


3  =  1 

=  -d(t)  -  Q(t)*i+i(t)  -  NB(t)ui+1B(t)  -  NR(t)ui+1R(t)  -  i/i+i(t)'A(t) 


(8.56) 


backwards  from  the  terminal  condition  =  77  4*  Q/x^i^/). 

Step  8:  Update  the  costate  by 

1(*0  ~  d"  Pi  \^i~ fl(^)  “ 


with  a  step  size  pi  G  (0, 1]. 

Step  9:  If  ||&i||  <  e,  stop.  Otherwise,  go  to  Step  1  with  i  replaced  by  i  4-  1. 


(8.57) 


Remark  4.  When  the  estimate  (u*,  x*,  /z»)  is  far  from  a  local  (Nash)  optimum  (it*,  x*,  A*),  we  may  choose 
positive  numbers  less  than  one  for  the  step  sizes  a*  and  pi  in  order  to  keep  the  iterative  process  stable. 
However,  when  the  the  estimate  (u*,  Xi,  p^)  is  close  to  a  local  (Nash)  optimum  (it*,  x*,  A*),  we  recommend 
the  choice  of  a*  =  1  and  Pi  =  1  to  make  the  iterative  process  converge  faster.  Recall  that  Sa*  =  0  is  a 
necessary  condition  for  the  Nash  solution  u *.  Hence 


or 


ii&iii  :=  sup 
1 


(8.58) 


(8.59) 


may  be  used  to  measure  the  proximity  of  the  current  solution  estimate  Ui  to  the  Nash  solution  it*. 
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8.5  Experiment  Results  and  Analysis 

The  cases  shown  here  cover  both  the  scenario  for  1  unit  vs.  1  unit  and  the  scenario  for  5  units  vs.  5 
units,  for  both  Model  2  and  Model  3.  The  results  for  1  unit  vs.  I  unit  are  shown  in  Figure  8.1  and 
Figure  8.2.  The  results  for  5  units  vs.  5  units  are  shown  in  Figure  8.3  and  Figure  8.4.  Each  Blue  unit 
starts  with  10  interceptors,  and  each  Red  unit  starts  with  10  bombers.  In  these  experiments1,  each  force 
has  two  objectives:  i)  to  reach  its  specified  fixed  destination  target,  and  ii)  to  reduce  the  number  of  enemy 
platforms  while  preserving  the  number  of  its  own  platforms  as  many  as  possible. 

We  performed  the  experiment  starting  from  an  initial  solution  estimate,  whose  initial  velocity  controls 
consist  of  a  constant  control  so  that  its  resulting  trajectory  is  the  straight  line  from  its  initial  location 
to  its  destination  target’s  location.  As  an  initial  firing  intensity  control  for  each  force,  we  chose  constant 
functions  as  well. 


ooo:  quadratic  -  quadratic 
+++:  linear  -  quadratic 
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Figure  8.1:  Convergence  of  SQQM  and  SLQM  (Model  2,  1  unit  vs.  1  unit). 

The  actual  implementation  in  Matlab  of  the  Quadratic-Quadratic  Algorithm  is  based  on  the  imple¬ 
mentation  for  the  Linear- Quadratic  Algorithm.  The  main  difference  resides  in  the  introduction  of  the 
quadratic  term  of  the  plant  model,  by  way  of  a  costate,  to  the  cost  function.  As  the  value  of  the  costate 
is  also  recomputed  at  each  iteration  (by  solving  a  system  of  ordinary  differential  equations),  the  choice 
of  its  initial  value  greatly  influences  the  convergence  and  speed  of  the  iterative  process.  As  the  default, 
the  initial  value  of  the  costate  is  now  set  to  zero;  if  a  better  initial  value  is  found  by  other  means  (e.g. 
method  of  characteristics),  a  faster  convergence  may  be  attained. 

The  iterations  are  performed  by  specifying  the  following  parameters:  the  maximum  number  of  iter¬ 
ations,  the  threshold  for  the  norm  of  the  control  correction  ||<5uj||  under  which  convergence  is  declared, 
the  step-size  a*  for  the  control  refinement  and  the  step-size  fa  for  the  costate  refinement.  A  conservative 
choice,  i.e.  smaller  values,  of  the  last  two  parameters,  a*  and  fa,  is  safer  for  the  stability  of  the  sequential 
iterations,  but  at  the  price  of  a  slower  convergence  speed. 

A  special  remark  must  be  made  about  numerical  precision:  as  expected  from  theoretical  results,  the 
Quadratic- Quadratic  Algorithm  is  more  sensitive  to  numerical  round-offs  and  approximations  than  the 
Linear-Quadratic  Algorithm  is.  Thus,  to  arrive  at  a  small  value  of  the  norm  of  the  control  increment 
||fa||,  it  might  be  necessary  to  increase  the  relative  tolerance  and/or  the  absolute  tolerance  of  the  chosen 
integration  algorithms;  for  this  purpose,  we  made  use  of  integration  ‘options’  (as  provided  in  Matlab)  at 

1  These  scenarios  are  typical  of  simulations  described  in  greater  detail  in  other  chapters;  here  our  goal  is  to  compare  the 
behavior  of  the  SQQM  algorithm  with  that  of  the  SLQM  algorithm. 
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Figure  8.2:  Convergence  of  SQQM  and  SLQM  (Model  3,  1  unit  vs.  1  unit). 

each  call  of  the  integration  routines.  The  current  default  values  have  been  selected  after  running  several 
scenarios,  obtaining  a  reasonable  compromise  between  speed  and  accuracy. 

Finally,  we  mention  that  the  quadratic  algorithm  numerically  requires  the  Hessian  of  the  plant  model. 
The  subprograms  for  computing  the  Hessians  have  has  been  built  using  the  symbolic  math  tool  of  the 
Maple  core  in  Matlab;  this  procedure  for  building  the  subprograms  has  been  automated,  so  as  to  be  able 
to  switch  easily  between  models  (i.e.,  between  Model  2  and  Model  3).  When  the  size  of  the  problem 
increases,  the  computational  work  required  by  the  evaluation  of  the  Hessian  can  considerably  increase 
the  computational  time. 

Several  experiments  on  different  scenarios,  based  on  both  Model  2  and  Model  3,  have  shown  the 
convergence  of  the  outputs  of  the  SQQM  and  SLQM  algorithms  to  the  same  solution.  As  for  the  speed, 
the  SQQM  has  generally  proven  to  be  faster  in  scenarios  not  too  complex,  both  for  Model  2  and  Model 
3.  For  example,  in  Figure  8.1  and  Figure  8.2,  the  norm  ||<Su||  of  the  control  correction  for  the  SLQM  is 
shown  as  a  function  of  the  time  needed  by  the  iterative  procedure.  For  the  SLQM,  the  norm  reduces 
slowly  but  steadily;  for  the  SQQM,  instead,  it  first  increases,  due  perhaps  to  a  poor  initial  choice  of  the 
costate  /io,  but  then  decreases  much  more  rapidly  than  the  SLQM.  Note  that  the  time  needed  for  the 
computation  of  each  iteration  is  almost  halved,  for  both  the  SLQM  and  the  SQQM,  when  using  Model  3 
instead  of  Model  2,  due  to  a  simplification  in  model  structure;  this  can  be  seen  in  all  the  figures. 

For  higher  order  models,  the  computational  time  required  at  each  iteration  becomes  very  large,  and 
the  SLQM  in  general  performs  better,  as  shown  in  Figure  8.3  and  Figure  8.4  for  a  scenario  of  5  units  vs.  5 
units.  We  need  to  point  out  the  difference  in  the  computational  time  needed  for  one  iteration  between  the 
two  methods.  The  horizontal  distance  between  successive  points  indicates  the  time  needed  to  complete 
one  iteration.  For  example,  one  iteration  of  the  SQQM  takes  about  60  seconds,  while  one  iteration  of  the 
SLQM  takes  only  15  seconds  in  Figure  8.4.  Indeed,  when  the  number  of  units  increases,  the  small  number 
of  iterations  of  the  quadratic  algorithm  cannot  compensate  for  the  computational  burden  of  evaluating 
the  Hessian  along  the  estimated  trajectory.  The  choice  of  the  initial  value  for  the  costate  fii  in  the  SQQM 
is  critical,  in  the  sense  that  the  simulation  may  stop  before  the  costate  starts  to  converge. 

In  order  to  surmount  this  seeming  weakness  of  the  SQQM  algorithm  for  complex  cases,  we  devised 
an  alternative  method  by  using  both  the  linear-quadratic  and  the  quadratic-quadratic  methods.  First, 
the  linear-quadratic  algorithm  is  started  and  a  test  is  routinely  performed  in  order  to  monitor  if  the 
quadratic-quadratic  algorithm  can  take  over,  namely  when  the  correction  norm  at  the  next  estimate 
computed  by  the  SQQM  is  smaller  than  the  correction  norm  at  the  next  estimate  computed  by  the 


150 


i! n  9 II 


ooo:  quadratic  -  quadratic 
+++:  linear  -  quadratic 


Figure  8.3:  Convergence  of  SQQM  and  SLQM  (Model  2,  5  units  vs.  5  units) 
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Figure  8.4:  Convergence  of  SQQM  and  SLQM  (Model  3,  5  units  vs.  5  units) 
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Figure  8.5:  Convergence  of  SQQM,  SLQM  and  SLQM-SQQM  (Model  2,  5  units  vs.  5  units). 

SLQM.  Indeed,  the  linear-quadratic  algorithm  serves  the  purpose  of  reaching  an  estimate  solution  which 
is  sufficiently  close  to  the  optimal  solution,  and  the  purpose  of  supplying  a  costate  estimate  when  it  is 
possible  to  switch  to  the  quadratic-quadratic  algorithm.  For  example,  in  Figure  8.5,  the  diamond  line 
shows  the  performance  of  the  blended  method.  Indeed  the  SQQM  takes  over  after  just  one  step  of  the 
SLQM,  and  at  that  point  the  value  of  the  step-size  for  the  costate  update  pi  can  be  increased  to  its 
maximum  value  of  1,  increasing  considerably  the  convergence  rate.  Indeed,  the  norm  of  the  correction  in 
the  diamond  line  reduces  very  quickly  to  1CT1.  Such  a  blend  of  the  two  methods  gives  the  best  results: 
once  the  SQQM  takes  over,  it  converges  to  the  optimal  solution  more  rapidly  than  the  SLQM  alone  or 
the  SQQM  alone.  Note  that  parameters  are  to  be  set,  such  as  the  step-sizes  for  control  update,  costate 
update  and  switching  conditions. 

8.6  Conclusions  and  Recommendations 

The  Sequential  Quadratic-Quadratic  Algorithm  converges  to  the  same  solution  found  through  the  Se¬ 
quential  Linear- Quadratic  Algorithm,  for  all  models  and  scenarios.  So  the  first  hypothesis  of  Experiment 
8  is  proven  true.  About  the  second  one,  namely  an  improvement  in  convergence  speed,  the  conclusion 
is  clear  at  this  point.  Indeed,  the  SQQM  alone  proves  to  be  faster  in  simple  scenarios;  if,  however,  the 
starting  trajectory  and  costate  estimates  are  too  far  from  the  optimal  solution,  the  SLQM  may  be  used 
at  first,  and  then  switch  to  the  SQQM  once  the  solution  estimate  is  close  to  the  optimal  solution.  In  more 
complex  cases  it  is  thus  advantageous  to  blend  the  linear-quadratic  algorithm  and  the  quadratic-quadratic 
algorithm,  taking  advantage  of  both  the  superior  stability  of  the  SLQM  and  the  superior  speed  of  the 
SQQM. 
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Chapter  9 


Experiment  9:  Detector 
Performance  under  Noise 

9.1  Executive  Summary 

In  this  Chapter  we  report  the  experiments  performed  to  test  the  effectiveness  of  a  newly  designed  “game- 
theoretic-optimal”  detection  filter  in  handling  noise-corrupted  observations  of  the  battlefield.  The  basic 
purpose  of  the  detection  filter  is  to  reveal  the  occurrence  of  an  “engagement  action”  from  enemy  units  by 
monitoring  only  variables  associated  with  the  friendly  units.  The  game-theoretic  approach  to  the  design 
of  the  filter  makes  it  possible  to  attenuate  the  effects  of  measurement  noises,  but  not  the  effects  of  the 
action  to  be  detected.  The  outcome  of  the  experiments  shows  very  clearly  that  the  game-theoretic  filter 
is  very  effective  under  different  situations  of  noise  and  compares  very  favorably  with  a  filter  designed  on 
the  basis  of  classical  state-estimation  methods. 


9.2  Purpose  of  the  Experiment 

This  section  of  the  report  describes  experiments  on  detection  and  isolation  of  multiple  enemy  actions  in 
a  battlefield.  Specifically,  the  basic  purpose  of  this  first  series  of  experiments  is  to  test  the  effectiveness 
of  a  newly- designed  (“game-theoretic”)  filter  in  handling  noise- corrupted  observations  of  the  battlefield. 

The  mathematical  description  of  the  battlefield  used  here  is  the  one  introduced  in  [1].  We  consider  the 
case  in  which  two  opposing  forces  are  present  in  the  theater  of  operations,  the  Blue  force  (the  “friends”) 
and  the  Red  force  (the  “enemies”).  Each  force  consists  of  two  units  and  each  unit  consists  of  a  number 
of  platforms  whose  evolution  in  time  is  described  by  a  first  order  nonlinear  differential  equation  and 
depends  on  the  “actions”  which  the  opposing  units  are  performing  against  the  unit  in  question.  If  any 
“new”  action  is  performed  by  any  of  the  opposing  units,  this  affects  the  evolution  of  the  number  of 
platforms  of  the  other  force’s  units.  Letting  ijfi  and  77^,  with  i  —  1,2,  denote  the  number  of  platforms 
of  the  i- th  Red  and  -  respectively  -  z-th  Blue  unit,  the  model  in  question  is  a  four-dimensional  nonlinear 
system  described  by  two  pairs  of  equations  of  the  form  (cf.  [1]) 
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In  these  equations,  i r^(-)  and  7r^(*)  are  (independent)  input  variables  representing  the  “level  of  en¬ 
gagement”  of  the  j- th  Red  unit  with  the  z-th  Blue  unit  and  -  respectively  of  the  j-th  Blue  unit  with 
the  z-th  Red  unit.  For  convenience,  we  suppose  in  all  our  experiments  that 

’ffi(i)  =  *u(t)  =:  7if  (0,  n^{t)  =  7 r?2{t)  =:  7if  (t)  . 

This  means,  in  the  terminology  of  [1],  page  2,  that  we  allow  the  “unique  target  constraint”  to  be  violated 
(for  the  Red  units  only). 

The  basic  problem  addressed  in  our  series  of  experiments  on  the  design  of  filters  for  the  detection  of 
enemy  actions  is  the  following  one:  we  monitor  only  the  number  of  platforms  in  the  two  Blue  units  (i.e. 
we  measure  only  the  values  of  the  two  state  variables  ,77^)  and  we  want  to  detect  the  occurrence  of 
an  “engagement  action ”  from  either  one  of  the  two  B.ed  units  (i.e.  we  want  to  detect  when  either  one  of 
the  two  input  signals  7r/*(-)  has  become  nonzero).  Implicit  in  this  is  the  assumption  that  the  two  other 
state  variables  (number  of  platforms  of  the  two  enemy  units)  as  well  as  all  the  input  variables 

7Tjj ,  z,  j  =  1,2,  and  irf*,  i  =  1,2  are  not  monitored.  The  purpose  of  the  detection  process  is  precisely  the 
determination  of  when  either  or  rr^  has  become  nonzero,  without  having  it  directly  measured. 


9.3  Hypothesis  to  Prove  or  Disprove 


In  the  first  series  of  experiments,  conceived  to  test  the  effectiveness  of  our  “game-theoretic”  detection 
filter,  we  consider,  instead  of  the  full  nonlinear  model  (9.1),  a  bilinear  approximation . 

This  bilinear  model  has  the  form  (see  [1],  equation  (29)) 


d 
dt 

d_ 
dt 
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Vi(t) 

V2D 


—aRT]i(t)  -  yRr/f  (t)nfx (t)  -  (*)*£(*), 

-aRr]%{t)  -  721  (t)*f2(t)  -  722*72* Dn$2(t), 

-aBr)R (t)  -  yuV\{t)^u(t)  -  7iS2f72t(07r2i(t). 


(9.1) 


(*)  =  -<xBr£(t)  -  -  72B2'7?(<)7r2i2(t)  • 

The  structure  of  the  filter  that  we  use  to  solve  the  detection  problem  described  in  the  previous  section 
is  chosen  according  to  a  general  (differential-geometric)  methodology  developed  in  our  papers  [2],  [3]. 
This  filter  receives  as  inputs  the  two  observed  variables  77 f ,  77^  and  generates  as  outputs  two  signals  r\ , 
7*2,  called  performance  signals  (typically  known  also  with  the  name  of  residuals ),  in  such  a  way  that  Vi(t) 
is  zero  if  the  Red  unit  z  is  not  engaged  with  the  Blue  units  at  time  t  (i.e.  if  tt/*(£)  =  0),  and  that  r^t) 
is  nonzero  if  the  Red  unit  i  is  engaged  with  the  Blue  units  (i.e.  if  7 rB(t)  ^  0).  Specifically,  this  filter  is 


Vi{t)  = 

- OiBm{t )  +  9i(ri\(t)  -  -  171(f)). 
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in  which  the  “gain  parameters”  g\  and  g2  are  to  be  determined  in  some  “optimal”  way. 

The  basic  issue  that  makes  the  design  of  the  filter  (i.e.  of  the  gain  parameters  g\  and  g2)  critical  is 
the  presence  of  measurement  noises  on  the  two  monitored  variables  r)f  ,77^  (the  two  inputs  to  the  filter 
(9.2)).  As  a  matter  of  fact,  standard  methods  for  state  estimation  from  noisy  observations  (such  as  those 
used  in  the  classical  Kalman  filter)  may  well  reduce  the  effect  of  measurement  noise  on  the  performance 
signals  r  1,7*2,  but  only  at  the  expenses  of  a  reduction  of  the  sensitivity  of  the  performance  signals  to  the 
signals  7rf ,  tt^  that  need  to  be  detected.  In  other  words,  these  methods  tend  to  uniformly  filter  out  the 
noises  affecting  the  measurements  as  well  as  the  signals  that  need  to  be  recognized.  This  unpleasant 
circumstance  was  observed  in  our  first  series  of  experiments,  in  which  a  design  technique  inspired  to  the 
theory  of  Kalman  filter  was  used. 

In  order  to  improve  the  “diagnostic”  capability  of  the  detection  filter,  it  is  of  primary  importance  to 
selectively  reduce  the  effect  of  the  measurement  noise  while  not  attenuating  the  signal  associated  with 
the  action  to  detect.  We  have  achieved  this  goal  by  casting  the  detection  problem  in  a  game- theoretic 
framework  in  which  the  measurement  noise  and  the  event  to  detect  are  seen  as  opposing  players  (see  [4]). 

The  series  of  experiments  described  in  this  section  of  the  report  validates  the  effectiveness  of  our 
“game- theoretic”  filter,  and  demonstrates  how  this  filter  compares  very  favorably  with  a  filter  in  which 
the  gain  parameters  are  not  chosen  in  a  game-theoretical  optimal  way,  but  rather  on  the  basis  of  classical 
state-estimation  methods  (in  what  follows,  we  will  refer  to  the  latter  as  to  a  “Kalman-like”  filter). 


9.4  Experiment  Setup 


This  series  of  experiments  of  detection  of  enemy  actions  is  designed  in  the  following  way.  The  battlefield 
is  modeled  as  in  equation  (9.1),  whose  four  states  represent  the  number  of  platforms  of  each  of  the  four 
units  involved  in  the  battle.  The  inputs  variables,  representing  the  level  of  engagement  of  the  battling 
units,  are  fixed  functions  of  time.  In  particular,  the  two  levels  of  engagement  of  the  Red  units  1  and  2 
versus  the  Blue  units  vary  with  time  as  shown  in  Figure  9.1,  where  “Action  1”  represents  the  level  of 
engagement  of  Red  unit  1  and  “Action  2”  represents  the  level  of  engagement  of  Red  unit  2.  Note  that 
the  first  action  occurs  at  t  —  30  units  of  time,  whereas  the  second  action  takes  place  at  t  —  50  units  of 
time  and  this  while  the  first  action  is  still  occurring.  The  corresponding  behavior  in  time  of  the  number 
of  Red  and  Blue  platforms  in  each  unit  is  plotted  in  Figure  9.2.  Finally,  Figure  9.3  depicts  the  outcome 
of  the  two  observations,  namely  the  evolution  in  time  of  the  number  of  platforms  in  the  two  Blue  units, 
corrupted  by  measurement  noise  (compare  with  the  two  bottom  graphs  in  Figure  9.2,  depicting  the  same 
quantities  without  measurement  noise). 

Figure  9.4  shows  a  block  diagram  describing  the  experiment.  The  two  (noise-corrupted)  measured 
observations  are  fed  into  the  filter:  the  Output  1  of  this  filter  is  expected  to  reveal  the  occurrence  of 
Action  1,  while  the  Output  2  is  expected  to  reveal  the  occurrence  of  Action  2.  The  purpose  of  the 
experiment  is  to  compare  the  effectiveness  of  the  detection  process  with  respect  to  two  different  choices 
of  the  gain  parameters  #i,#2  in  the  detection  filter  (9.2).  In  the  first  choice,  that  leads  to  what  we  refer 
to  as  a  “Kalman-like”  filter,  each  parameter  g*,  i  —  1,2,  is  chosen  in  such  a  way  as  to  render 
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where  [0,  ti]  is  the  interval  of  time  over  which  the  experiment  is  performed,  r*,  i  =  1,  2  are  the  variables 
as  defined  in  (9.2),  Uj,  for  i  =  1,2,  are  the  noise  signals  affecting  the  measurements  of  rqf)  Q,M,V  are 
suitable  weighting  matrices,  and  7  is  a  threshold  parameter  representing  the  attenuation  of  the  noise 
on  the  residual.  In  the  second  choice,  that  leads  to  what  we  refer  to  as  a  “game-theoretic”  filter,  each 


parameter  <?*,  i  —  1,2,  is  chosen  in  such  a  way  as  to  render 
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in  which  nf1,  i  =  1,2,  are  the  engagement  enemy  actions  (which  must  be  detected),  N  is  a  weighting 
matrix  and  the  other  variables  and  parameters  are  as  in  (9.3). 
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Figure  9.4:  Block  diagram  describing  the  experiment. 


The  experimental  data  that  we  have  obtained  show  the  improved  effectiveness  of  the  game-theoretic 
detection  filter  with  respect  to  the  Kalman-like  filter.  This  is  demonstrated  by  the  comparison  of  the  two 
different  time-behaviors  of  the  performance  signals  generated  by  the  two  different  filters,  in  response  to 
the  same  measured  observation  corrupted  by  random  noise.  The  filters  are  tested  with  respect  to  noise 
with  different  amounts  of  energy  and,  for  each  fixed  amount  of  noise  energy,  a  certain  number  of  different 
noise  signals  are  considered. 


9.5  Example  of  experiment 

In  this  experiment  we  consider  a  noise  whose  energy  is  equal  to  15%  of  the  energy  of  the  noise-free 
output.  This  amount  of  noise  is  already  enough  to  hide,  in  the  observation  of  the  number  of  platforms 
of  the  two  friendly  units  (see  Figure  9.3  for  a  picture  of  the  two  noisy  observations),  any  sign  of  the 
occurrence  of  either  one  of  the  two  engagement  actions  from  the  enemy  forces.  Nevertheless,  when  the 
noisy  measurements  are  processed  by  a  detection  filter,  a  good  deal  of  information  about  the  engagement 
actions  which  have  occurred  can  be  derived. 

Figure  9.5  depicts  the  responses  to  Action  1  of  the  game- theoretic  filter  (top)  and  of  the  Kalman- like 
filter  (bottom).  The  comparison  of  the  two  plots  already  demonstrates  a  much  better  performance  of 
the  game- theoretic  filter  as  opposite  to  that  of  the  Kalman-like  filter.  Note  also  that  in  neither  of  the 
two  cases  the  output  of  the  filter  is  affected  by  the  occurrence  of  Action  2,  as  it  should  be.  Figure  9.6 
depicts  the  responses  to  Action  2  of  the  game-theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom). 
The  comparison  of  these  latter  plots  demonstrates  more  dramatically  the  efficiency  of  the  game-theoretic 
filter.  As  a  matter  of  fact,  the  presence  of  noise  on  the  observations  is  sufficient  to  completely  hide  the 
occurrence  of  Action  2  in  the  output  of  the  Kalman-like  filter,  while  the  output  of  the  game-theoretic 
filter  still  reveals  this  occurrence  with  a  good  deal  of  confidence.  For  the  sake  of  completeness,  in  Figure 
9.7  we  show  also  the  time  histories  of  the  two  state  variables  of  the  two  filters. 

A  special  feature  of  the  detection  process  that  deserves  to  be  stressed  is  the  complete  “independence” 
or  “non-interaction”  of  the  two  performance  signals.  The  first  one  increases  its  response  upon  the  oc- 
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Figure  9.8:  Response  to  Action  1  of  the  game- theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom) 
under  noise  with  energy  equal  to  25%  of  the  energy  of  the  noise- free  output. 


currence  of  the  first  action  but  it  is  not  affected  by  the  occurrence  of  the  second  action.  The  second 
performance  signal  remains  close  to  zero  when  the  first  action  takes  place,  while  it  becomes  patently 
nonzero  as  the  second  action  occurs.  This  “non-interaction”  property  is  at  the  basis  of  the  isolation 
process  (that  is,  to  distinguish  which  action  is  occurring  and  when)  and  is  an  outcome  of  the  geometric 
approach  to  the  problem  of  detection  and  isolation. 

Finally,  it  should  be  stressed  that  the  difference  between  the  two  performance  signals  depends  not 
only  on  the  fact  that  they  are  responses  to  different  input  signals  but  also  on  the  choice  of  weights  in  the 
cost  function  which  determines  the  parameters  of  the  filters,  which  have  been  differently  chosen  in  the 
two  cases. 


9.6  Results  of  the  Experiments 

Our  experiments  are  aimed  also  at  testing  the  behavior  of  the  detection  filters  to  measurement  noises 
of  progressively  increasing  energy.  The  response  of  the  game-theoretic  and  Kalman-like  filters  when  the 
energy  of  the  noise  is  equal  to  15%  of  the  energy  of  the  noise- free  measurement  has  been  reported  in  the 
previous  section  (cf.  Figures  9.5  and  9.6).  Here  we  consider  the  responses  to  the  same  noise  (the  seed  is 
the  same  as  before)  but  with  an  increased  level  of  energy.  In  particular,  we  consider  the  cases  of  noise 
whose  energy  is  equal  to  25%,  35%,  55%,  80%  and  110%  of  the  energy  of  the  noise-free  measurement, 
and  plot  the  responses  of  the  two  filters  in  Figures  9. 8-9. 9,  9.10-9.11,  9.12-9.13,  9.14-9.15  and  9.16-9.17, 
respectively. 

In  our  experiments,  we  have  also  tested  the  behavior  of  the  detection  filter  under  different  kinds  of 
noise  (randomly  assigned  seed)  and  different  energy  levels  of  the  noise.  In  particular,  for  each  level  of 
noise  (15%,  25%,  35%  of  the  energy  of  the  noise-free  measurement  signal),  we  have  run  three  experiments 
for  the  game-theoretic  filter  and  three  experiments  for  the  Kalman-like  filter,  each  one  with  a  different 
noise  signal.  The  outcome  of  these  experiments  is  depicted,  for  the  different  levels  of  noise,  in  Figures 
9.18-9.19,  Figures  9.20-9.21  and  Figures  9.22-9.23,  respectively. 


9.7  Conclusions  and  Recommendations 

The  game-theoretic  detection  and  isolation  filter  shows  a  very  good  performance  under  different  situation 
of  noise  corrupting  the  measurements.  This  is  particularly  true  when  compared  to  the  behavior  of  the 
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Kalman-like  detection  filter.  The  main  reason  for  this  improvement  is  in  the  selective  attenuation  of  the 
noise  accomplished  by  the  game- theoretic  filter,  which  does  not  reduce  the  effect  of  the  signal  to  detect. 
This  selective  attenuation  of  the  noise  increases  the  diagnostic  capability  of  the  game-theoretic  filter  even 
in  the  presence  of  high  levels  of  noise  energy. 
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Figure  9.11:  Response  to  Action  2  of  the  game-theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom) 
under  noise  with  energy  equal  to  35%  of  the  energy  of  the  noise- free  output. 


Figure  9.12:  Response  to  Action  1  of  the  game-theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom) 
under  noise  with  energy  equal  to  55%  of  the  energy  of  the  noise- free  output. 
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Figure  9.13:  Response  to  Action  2  of  the  game-theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom) 
under  noise  with  energy  equal  to  55%  of  the  energy  of  the  noise- free  output. 


Figure  9.14:  Response  to  Action  1  of  the  game-theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom) 
under  noise  with  energy  equal  to  80%  of  the  energy  of  the  noise-free  output. 
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Figure  9.15:  Response  to  Action  2  of  the  game- theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom) 
under  noise  with  energy  equal  to  80%  of  the  energy  of  the  noise-free  output. 


Figure  9.16:  Response  to  Action  1  of  the  game-theoretic  filter  (top)  and  of  the  Kalman- like  filter  (bottom) 
under  noise  with  energy  equal  to  110%  of  the  energy  of  the  noise-free  output. 
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Figure  9.17:  Response  to  Action  2  of  the  game-theoretic  filter  (top)  and  of  the  Kalman-like  filter  (bottom) 
under  noise  with  energy  equal  to  110%  of  the  energy  of  the  noise-free  output. 
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Figure  9.18:  Responses  of  the  game-theoretic  (top)  and  Kalman-like  filter  (bottom)  to  Action  1.  The 
six  responses  correspond  to  six  different  random  choices  of  the  seed  of  the  noise  signals.  The  energy  of 
the  noise  signals  is  equal  to  15%  of  the  output  signal  energy. 
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Figure  9.19:  Responses  of  the  game- theoretic  (top)  and  Kalman-like  filter  (bottom)  to  Action  2.  The 
six  responses  correspond  to  six  different  random  choices  of  the  seed  of  the  noise  signals.  The  energy  of 
the  noise  signals  is  equal  to  15%  of  the  output  signal  energy. 
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Figure  9.20:  Responses  of  the  game-theoretic  (top)  and  Kalman-like  filter  (bottom)  to  Action  1.  The 
six  responses  correspond  to  six  different  random  choices  of  the  seed  of  the  noise  signals.  The  energy  of 
the  noise  signals  is  equal  to  25%  of  the  output  signal  energy. 
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Figure  9.21:  Responses  of  the  game- theoretic  (top)  and  Kalman- like  filter  (bottom)  to  Action  2.  The 
six  responses  correspond  to  six  different  random  choices  of  the  seed  of  the  noise  signals.  The  energy  of 
the  noise  signals  is  equal  to  25%  of  the  output  signal  energy. 


Figure  9.22:  Responses  of  the  game- theoretic  (top)  and  Kalman-like  filter  (bottom)  to  Action  1.  The 
six  responses  correspond  to  six  different  random  choices  of  the  seed  of  the  noise  signals.  The  energy  of 
the  noise  signals  is  equal  to  35%  of  the  output  signal  energy. 
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Figure  9.23:  Responses  of  the  game-theoretic  (top)  and  Kalman-like  filter  (bottom)  to  Action  2.  The 
six  responses  correspond  to  six  different  random  choices  of  the  seed  of  the  noise  signals.  The  energy  of 
the  noise  signals  is  equal  to  35%  of  the  output  signal  energy. 
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Chapter  10 

Experiment  10:  Detector 
Performance  under  Parameter 
Variations 

10.1  Executive  Summary 


This  Chapter  describes  the  experiment  results  regarding  the  game-theoretic  detection  filter  under  para¬ 
metric  uncertainty.  The  exact  values  of  the  parameters  in  the  mathematical  model  of  the  battlefield  are 
not  known,  and  only  a  nominal  value  is  available.  The  filter,  whose  objective  is  to  reveal  the  occurrence 
of  an  “engagement  action”  from  enemy  units,  is  designed  on  the  basis  of  the  nominal  value.  This  set  of 
experiments  shows  that  the  game-theoretic  detection  filter,  although  proven  to  be  effective  in  the  selective 
attenuation  of  measurement  noise,  is  relatively  sensitive  to  the  uncertainty  in  the  parameters. 


10.2  Purpose  of  the  Experiment 


In  [5],  we  tested  the  effectiveness  of  the  game-theoretic  filter  in  detecting  enemy  actions  in  the  case  in 
which  the  measurements  coming  from  the  theater  of  operations  were  affected  by  noise.  The  basic  purpose 
of  this  second  series  of  experiments  is  to  assess  the  robustness  of  the  “game-theoretic”  detection  filter 
(cf.  [4])  in  the  presence  of  uncertainty  in  the  parameters  appearing  in  the  mathematical  model  of  the 
battlefield.  Following  [5],  we  summarize  in  the  remainder  of  this  section  the  problem  of  detection  and 
isolation  of  multiple  enemy  actions  in  a  battlefield. 

The  mathematical  description  of  the  battlefield  used  here  is  the  one  introduced  in  [1]  and  already 
adopted  in  [5].  We  consider  the  case  in  which  two  opposing  forces  are  present  in  the  theater  of  operations, 
the  Blue  force  (the  “friends”)  and  the  Red  force  (the  “enemies”).  Each  force  consists  of  two  units  and 
each  unit  consists  of  a  number  of  platforms  whose  evolution  in  time  is  described  by  a  first  order  nonlinear 
differential  equation  and  depends  on  the  “actions”  which  the  opposing  units  are  performing  against  the 
unit  in  question.  If  any  “new”  action  is  performed  by  any  of  the  opposing  units,  this  affects  the  evolution 
of  the  number  of  platforms  of  the  other  force’s  units.  Letting  pf-  and  rjf,  with  i  =  1,2,  denote  the 
number  of  platforms  of  the  i-th  Red  and  -  respectively  -  i-th  Blue  unit,  the  model  in  question  is  a 
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four-dimensional  nonlinear  system  described  by  two  pairs  of  equations  of  the  form  (cf.  [1]) 
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In  these  equations,  7r j^(-)  and  nfff)  are  (independent)  input  variables  representing  the  “level  of  en¬ 
gagement  ”  of  the  j-th  Red  unit  with  the  i-th  Blue  unit  and  -  respectively  -  of  the  j-th  Blue  unit  with 
the  i- th  Red  unit.  For  convenience,  we  suppose  in  all  our  experiments  that 

*fi (t)  =  71 -{!(*)  =:  Trf  (t),  7 T^(t)  =  7 rf2(t)  =:  Trf  (t)  . 

This  means,  in  the  terminology  of  [1],  page  2,  that  we  allow  the  “unique  target  constraint”  to  be  violated 
(for  the  Red  units  only). 

The  basic  problem  addressed  in  our  series  of  experiments  on  the  design  of  filters  for  the  detection  of 
enemy  actions  is  the  following  one:  we  monitor  only  the  number  of  platforms  in  the  two  Blue  units  (i.e. 
we  measure  only  the  values  of  the  two  state  variables  r/f,^)  and  we  want  to  detect  the  occurrence  of 
an  “engagement  action”  from  either  one  of  the  two  Red  units  (i.e.  we  want  to  detect  when  either  one  of 
the  two  input  signals  7rf*(  )  has  become  nonzero).  Implicit  in  this  is  the  assumption  that  the  two  other 
state  variables  ,  rjrf  (number  of  platforms  of  the  two  enemy  units)  as  well  as  all  the  input  variables 
7 r^,  i,  j  —  1,2,  and  n f ,  i  =  1,2  are  not  monitored .  The  purpose  of  the  detection  process  is  precisely  the 
determination  of  when  either  7rf  or  n -f  has  become  nonzero,  without  having  it  directly  measured. 


10.3  Hypothesis  to  Prove  or  Disprove 

The  experiments  of  this  second  series  aim  to  test  the  performance  of  the  game-theoretic  detection  fil¬ 
ter  in  the  presence  of  parametric  uncertainties  and  are  implemented  considering,  as  in  [5],  a  bilinear 
approximation  of  (10.1)  instead  of  the  full  nonlinear  model. 

This  bilinear  model  has  the  form  (see  [1],  equation  (29)) 


ftVi  (i)  =  -aiVi  (0  -  ~  I12V2  (*)7rfi (0. 

jti l?(t)  =  ~  7*i ~  722»/f  (O’TmW. 

^7?f  (t)  =  -afr)f(t)  -  7n7*(f)7r*i(0  -  7?2»?2i(i)7r*i(t)> 

7^(0  -  7^ (t)7rf2 (t)  -  ^2(^22^)  . 


in  which  the  (actual)  values  of  the  parameters  appearing  in  the  equations  corresponding  to  the  Blue  units, 
namely  parameters  af ,  af ,  7n,  are  n°t  perfectly  known  and  may  differ  from  their  nominal 

values  af ,af ,7u»7^,  T?i>  7^>  which  are  assumed  to  be  known.  The  other  parameters  af,af,7fi, 
7i2> 721? 722  the  model  (10.1)  do  not  affect  the  design  and  the  response  of  the  detection  filter  and 
consequently  their  uncertainty  does  not  need  to  be  considered  in  this  series  of  experiments. 

The  structure  of  the  filter  that  we  use  to  solve  the  detection  problem  described  in  the  previous  section 
is  chosen  according  to  a  general  (differential-geometric)  methodology  developed  in  our  papers  [2],  [3]. 
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This  filter  receives  as  inputs  the  two  observed  variables  r)f ,  g®  and  generates  as  outputs  two  signals  r\ , 
7*2,  called  performance  signals  (typically  known  also  with  the  name  of  residuals ),  in  such  a  way  that  ri(t) 
is  zero  if  the  Red  unit  i  is  not  engaged  with  the  Blue  units  at  time  t  (i.e.  if  re B(t)  =  0),  and  that  rt(£) 
is  nonzero  if  the  Red  unit  i  is  engaged  with  the  Blue  units  (i.e.  if  n f-(t)  ^  0).  Specifically,  this  filter  is 
modeled  by  equations  of  the  form 
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in  which  aB  —  af  —  aB  and  the  “gain  parameters”  g\  and  gi  are  to  be  determined  in  some  “optimal” 
way. 

The  basic  issue  that  makes  the  design  of  the  filter  (i.e,  of  the  gain  parameters  g\  and  <72)  critical  is 
the  presence  of  measurement  noises  on  the  two  monitored  variables  r/f  ,gB  (the  two  inputs  to  the  filter 
(10.2)).  The  experiments  reported  in  [5]  demonstrate  that  the  performance  of  a  detection  filter  in  which 
the  gain  parameters  g\ ,  g<i  are  chosen  as  the  result  of  a  “game- theoretic”  design  is  substantially  more 
effective  than  a  detection  filter  in  which  the  gain  parameters  are  designed  through  standard  methods  for 
state  estimation  from  noisy  observations  (such  as  those  used  in  the  classical  Kalman  filter).  In  this  new 
series  of  experiments,  we  want  to  test  how  the  uncertainty  in  the  parameters  of  the  plant  (10.1)  affects 
the  behavior  of  the  detection  filter  in  which  g\  ,£2  derive  from  the  “game-theoretic”  design. 


10.4  Experiment  setup 

This  series  of  experiments  of  detection  of  enemy  actions  is  designed  in  the  following  way.  The  battlefield 
is  modeled  as  in  equation  (10.1),  whose  four  states  represent  the  number  of  platforms  of  each  of  the  four 
units  involved  in  the  battle.  The  (actual)  value  of  each  parameter  in  (10.1)  is  unknown.  The  inputs 
variables,  representing  the  level  of  engagement  of  the  battling  units,  are  fixed  functions  of  time.  In 
particular,  the  two  levels  of  engagement  of  the  Red  units  1  and  2  versus  the  Blue  units  vary  with  time 
as  shown  in  Figure  10.1,  where  “Action  1”  represents  the  level  of  engagement  of  Red  unit  1  and  “Action 
2”  represents  the  level  of  engagement  of  Red  unit  2.  Note  that  the  first  action  occurs  at  t  =  30  units 
of  time,  whereas  the  second  action  takes  place  at  t  =  50  units  of  time  and  this  while  the  first  action  is 
still  occurring.  The  corresponding  behavior  in  time  of  the  number  of  Red  and  Blue  platforms  in  each 
unit  is  plotted  in  Figure  10.2.  Finally,  Figure  10.3  depicts  the  outcome  of  the  two  observations,  namely 
the  evolution  in  time  of  the  number  of  platforms  in  the  two  Blue  units,  corrupted  by  measurement  noise 
(compare  with  the  two  bottom  graphs  in  Figure  10.2,  depicting  the  same  quantities  without  measurement 
noise) . 

Figure  10.4  shows  a  block  diagram  describing  the  experiment.  The  two  (noise-corrupted)  measured 
observations  are  fed  into  the  filter:  the  Output  1  of  this  filter  is  expected  to  reveal  the  occurrence  of 
Action  1,  while  the  Output  2  is  expected  to  reveal  the  occurrence  of  Action  2.  The  detection  filter 
considered  for  the  experiments  is  the  one  described  in  (10.2)  in  which  each  gain  parameter  <?*,  i  =  1,2,  is 
chosen  in  such  a  way  as  to  render 

sup  inf  [  Ml +  \n?\N-i-1\\vl\l-l+\v2\l,-l)}dt  <0,  fori  =  1,2,  (10.3) 

Vi  ,t>2  Trf  JO 
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Figure  10.2:  Time  history  of  the  number  of  platforms  of  the 
number  of  platforms  of  the  Red  units,  the  two  bottom  ploi 
Blue  units. 
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Figure  10.4:  Block  diagram  describing  the  experiment. 


where  [0,fi]  is  the  interval  of  time  over  which  the  experiment  is  performed,  i  =  1,2  are  the  variables 
as  defined  in  (10.2),  for  i  =  1,2,  are  the  noise  signals  affecting  the  measurements  of  Vi*>  *=1,2, 
are  the  engagement  enemy  actions  (which  must  be  detected),  Q,  M,  V,  N  are  suitable  weighting  matrices, 
and  7  is  a  threshold  parameter  representing  the  desired  attenuation  of  the  noise  on  the  residual. 

The  experiments  are  performed  by  letting  each  of  the  parameters  af ,  ,  7u  >  7i| ,  7fi ,  in  (10.1) 

to  differ  -  by  increasing  quantities  -  from  their  nominal  values  which  are  used 

in  design  of  the  filter  (10.2). 


10.5  Example  of  experiment 


We  illustrate  in  what  follows  an  example  of  experiment  in  which  the  actual  value  of  the  parameter  af 
is  equal  to,  respectively,  110%,  120%,  130%,  140%  and  150%  of  its  nominal  value  aB .  The  energy  of  the 
noise  corrupting  the  measurements  of  r?f ,  i  —  1, 2,  is  equal  to  15%  of  the  energy  of  rjf ,  i  -  1, 2.  Figures 
10.5  and  10.6  depict  the  responses  to  Action  1  and,  respectively,  Action  2  of  the  game- theoretic  filter. 

Figure  10.5  shows  how  the  response  to  Action  1  tends  to  evolve  away  from  zero  even  though  Action 
1  is  not  taking  place  (first  30  units  of  time).  However,  the  first  performance  signal  is  still  “sensitive” 
to  the  occurrence  of  Action  1,  as  it  is  demonstrated  by  the  “bump”  at  time  t  —  30  units  of  time  (when 
Action  1  occurs).  In  Figure  10.6,  a  similar  outcome  is  reported.  Indeed,  the  figure  shows  a  nonzero 
performance  signal  in  absence  of  Action  2  and  a  “bump”  at  time  t  —  50  units  of  time  (when  Action  2 
occurs).  The  “bump”  becomes  less  evident  as  the  amount  of  uncertainty  increases,  causing  the  detection 
of  Action  2  to  be  more  difficult.  We  note,  however,  that  the  uncertainty  on  af  does  not  influence  the 
“non-interaction”  property  of  the  two  performance  signals.  As  a  matter  of  fact,  the  first  performance 
signal  is  not  “sensitive”  to  Action  2  and  the  second  performance  signal  is  not  “sensitive”  to  Action  1. 
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Figure  10.5:  Response  to  Action  1  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
a\.  The  actual  value  of  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
\  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Figure  10.7:  Response  to  Action  1  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
c*2.  The  actual  value  of  a2  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 


10.6  Results  of  the  experiments 

In  the  remaining  experiments  of  this  series,  we  test  the  influence  of  the  uncertainty  in  the  parameters 
a?>7n>7S>72ii  722-  The  responses  of  the  filter  are  given  in  Figures  10.7  to  10.16.  The  actual  value  of 
each  parameter  is  taken  equal  to,  respectively,  110%,  120%,  130%,  140%  and  150%  of  the  corresponding 
nominal  value.  We  set  the  value  of  each  parameter  in  the  model  equal  to  one  of  the  actual  values  and 
perform  a  simulation.  The  noise  on  the  measurements  is  as  specified  in  the  previous  section. 

From  the  analysis  of  the  figures,  it  is  possible  to  observe  that  three  negative  phenomena  emerge  due  to 
parametric  uncertainty.  The  first  one  is  that  the  performance  signals  tends  to  evolve  away  from  zero 
even  when  no  event  is  occurring  (see  Figures  10.7  and  10.8).  The  second  phenomenon  is  the  loss  of  the 
“non-interaction”  property  of  the  two  performance  signals.  For  instance,  the  second  performance  signai 
tends  to  react  to  the  occurrence  of  Action  1  (see  Figure  10.10  at  time  t  =  30  units  of  time),  which  may 
lead  to  infer  erroneously  the  occurrence  of  Action  2  when  Action  1  is  actually  taking  place.  The  third 
phenomenon  is  a  reduction  in  the  capability  of  the  filter  to  “emphasize”  the  event  signal,  as  it  is  evident 
from  the  decreased  size  of  the  “bumps”  in  all  the  performance  signals. 


10.7  Conclusions  and  Recommendations 

The  game-theoretic  detection  and  isolation  filter,  although  proven  to  be  effective  in  the  selective  attenu¬ 
ation  of  measurement  noise,  has  been  shown  to  be  relatively  sensitive  to  uncertainty  in  the  parameters 
of  the  model  which  describes  the  battlefield.  In  particular,  a  progressive  degradation  of  the  effective¬ 
ness  of  the  detection  filter  has  been  observed  as  the  difference  between  actual  and  nominal  values  of  the 
parameters  increases. 


_ CONCLUSIONS  -  HYPOTHESIS  10 _ 

The  data  obtained  from  this  set  of  experiments  show  a  progressive 
degradation  of  the  filter  performance  as  the  difference  between  actual 
and  nominal  values  of  the  parameters  increases. _ 
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Uncertainty  on  «2  -  Performance  Signal  2 


Figure  10.8:  Response  to  Action  2  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
(*2-  The  actual  value  of  a 2  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Figure  10.9:  Response  to  Action  1  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
/?u.  The  actual  value  of  fin  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Uncertainty  on  -  Performance  Signal  2 


Figure  10.10:  Response  to  Action  2  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
f3u.  The  actual  value  of  fin  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Figure  10.11:  Response  to  Action  1  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
(3i2-  The  actual  value  of  Pv2  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Uncertainty  on  P12  -  Performance  Signal  2 


Figure  10.12:  Response  to  Action  2  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
$12*  The  actual  value  of  f3\2  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Figure  10.13:  Response  to  Action  1  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
/?2i  *  The  actual  value  of  fl\2  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Uncertainty  on  -  Performance  Signal  2 


Figure  10.14:  Response  to  Action  2  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
pl2.  The  actual  value  of  fin  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 


Figure  10.15:  Response  to  Action  1  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
#22-  The  actual  value  of  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Figure  10.16:  Response  to  Action  2  of  the  detection  filter  in  the  presence  of  uncertainty  in  the  parameter 
/?22-  The  actual  value  of  /?22  varies  from  110%  (top  graph)  to  150%  (bottom  graph)  of  the  nominal  value. 
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Chapter  11 


Experiment  11:  Method  of 
Characteristics 

11.1  Executive  Summary 

The  purpose  is  to  verify  that  the  solution  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM) 
is  the  same  as  the  Nash  solution  computed  by  the  Method  of  Characteristics.  We  verified  that  the  solu¬ 
tions  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM)  are  the  same  as  the  Nash  solutions 
computed  by  the  Method  of  Characteristics  under  several  scenarios.  Also,  systematic  tests  have  been  per¬ 
formed  to  study  robustness  under  two  ways  of  enforcing  constraints:  penalties  and  explicit  enforcement. 
Specifically,  weights  for  velocities,  engagement  intensities,  final  numbers  of  platforms  and  targets,  as  well 
as  maximum  rated  speeds  have  been  varied.  The  results  show  that  the  trajectories  are  quite  similar  in 
shape. 


11.2  Purpose  of  the  Experiment 

The  purpose  is  to  verify  that  the  solution  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM) 
is  the  same  as  the  Nash  solution  computed  by  the  Method  of  Characteristics. 


11.3  Hypothesis  to  Prove  or  Disprove 

Both  the  plant  and  internal  models  are  the  same,  i.e.,  the  Mission  Dynamics  Continuous-time  Model 
(MDCM).  The  Method  of  Characteristics,  which  can  be  used  to  determine  state  and  input  trajectories 
of  the  Nash  solution  given  the  final  states,  verifies  that  the  solution  computed  by  the  Sequential  Linear- 
Quadratic  Method  (SLQM)  is  indeed  the  Nash  solution. 


11.4  Experiment  Setup 

Two  scenarios  will  be  created.  The  Nash  solution  of  these  will  be  calculated  with  SLQM,  obtaining  state 
and  input  trajectories.  The  final  state  of  each  solution  will  be  used  to  solve  the  two-point  boundary  value 
problem  (TPBVP)  of  game  theory,  by  integrating  the  Hamiltonian  system,  (which  is  derived  using  the 
method  of  characteristics  for  partial  differential  equations),  backwards  in  time.  If  the  SLQM  solution  is 
indeed  the  Nash  solution,  it  should  be  the  same  as  the  solution  of  the  TPBVP. 
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11.5  Experiment  Results 

We  consider  the  simplest  case:  Each  of  the  Blue  and  Red  forces  has  one  unit.  Both  the  Blue  unit  (Bl) 
and  the  Red  unit  (Rl)  have  10  platforms.  Since  each  unit  has  a  4-dimensional  state  and  a  3-dimensional 
control  input,  the  entire  model  has  a  8-dimensional  state  and  a  6-dimensional  control  input.  In  this 
experiment,  each  force  has  two  objectives:  i)  to  reach  its  specified  fixed  target;  and  ii)  to  reduce  the 
number  of  enemy  platforms  while  preserving  the  number  of  its  own. 

Numerical  simulations  have  been  performed  for  two  different  scenarios:  “joust”  and  “cross”.  In 
“joust”,  blue  and  red  trajectories  tend  to  be  parallel  to  each  other,  whereas  in  “cross”,  they  intersect 
each  other. 

In  “joust”,  there  is  a  final  terminal  cost  p  on  the  numbers  of  platforms  in  the  payoff  function.  The 
parameter  values  are  aB  =  aR  =  0.05,  bB  =  bR  =  0.0,  and  pB  =  pR  =  0.0.  When  control  penalties  are 
used,  the  parameter  values  are  Rxy  =  RB  =  RB  =  RR  —  Ry  =  300,  and  Rn  —  RB  —  Rr  —  75.  The 
initial  positions  are  given  by  the  following  coordinates  relative  to  a  theater  of  operations  of  size  100  by 
100:  (20,50)  for  Bl  and  (80,52)  for  Rl.  The  location  of  targets  are  given  by  the  following:  (80,52)  for 
Bl  and  (20,50)  for  Rl. 

In  “cross” ,  there  is  no  terminal  cost  ip  in  the  payoff  function.  The  parameter  values  are  aB  =  aR  — 
0  05,  foB  =  0.005,  bR  —  1.5,  and  pB  =  pR  =  0.0.  When  control  penalties  are  used  the  parameter  values 
are  Rxy  =  RB  =  Ry  =  RR  =  Ry  =  400,  and  Rn  =  RB  -  Rr  =  100.  The  initial  positions  are  given 
by  the  following  coordinates  relative  to  a  theater  of  operations  of  size  100  by  100:  (20,50)  for  Bl  and 
(50,80)  for  Rl.  The  location  of  targets  are  given  by  the  following:  (80,50)  for  Bl  and  (50,20)  for  Rl. 

For  each  scenario,  three  different  cases  have  been  tested.  In  all  cases,  there  are  penalties  on  the 
velocities.  For  engagement  intensities,  we  test  penalties  without  explicit  constraints,  explicit  constraints 
without  penalties,  both  penalties  and  explicit  constraints.  The  effects  of  varying  penalty  parameters  for 
velocities  and  engagement  intensities  are  also  studied. 

11.5.1  Joust 

Case  1:  Penalties  only,  without  explicit  constraints  on  velocities  or  intensities.  (Figures  11.1-11.3) 

Case  2:  Explicit  constraints  on  intensities  without  penalties,  no  constraints  on  velocities.  (Figures 
11.6) 

Case  3:  Both  penalties  and  explicit  constraints  on  velocities  (of  type  A)  and  intensities.  (Figures 
11.9) 

Case  4:  Change  penalty  for  case  3.  Now  Rxy  —  240,  and  R K  —  55.  (Figures  11.10-11.12) 
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Figure  11.4:  Trajectories 
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Figure  11.13:  Trajectories 


11.5.2  Cross 

Case  1:  Penalties  only,  without  explicit  constraints  on  velocities  or  intensities.  (Figures  11.13-11.15) 
Case  2:  Explicit  constraints  on  intensities  without  penalties,  no  constraints  on  velocities.  (Figures  1  LIG¬ 
HTS) 

Case  3:  Both  penalties  and  explicit  constraints  (of  type  A)  on  velocities  and  intensities.  (Figures  1 1. 19— 

11.21) 

Case  4:  Change  penalty  for  Case  3.  Now  Rxy  =  310,  and  R ^  —  10.  (Figures  11.22-11.24) 
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Figure  11.22:  Trajectories 


Remark:  Note  that  in  some  cases  where  constraints  are  enforced,  engagement  intensities  are  cut  off  at 
their  limits  (figures  9,  12,  21,  24,  and  figure  13  for  velocities). 


11.6  Analysis 

Systematic  tests  have  been  performed  to  study  two  ways  of  enforcing  constraints:  penalties  and  explicit 
enforcement.  Also,  systematic  tests  have  been  performed  to  study  robustness.  Specifically,  weights  for 
velocities,  engagement  intensities,  final  numbers  of  platforms  and  targets,  as  well  as  maximum  rated 
speeds  have  been  varied.  The  results  show  that  the  trajectories  are  quite  similar  in  shape. 

11.7  Conclusions  and  Recommendations 

We  verified  that  the  solutions  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM)  are  the 
same  as  the  Nash  solutions  computed  by  the  Method  of  Characteristics  under  several  scenarios. 
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Chapter  12 


Experiment  12:  Game  Flow  Model 


12.1  Executive  Summary 

The  purpose  of  this  experiment  was  to  validate  the  Game  Flow  approach.  Validation  is  meant  in  the 
sense  that  the  game  theoretic  solution  engine  (i.e.,  the  Sequential  Linear  —  Quadratic  algorithm),  acting 
on  the  Game  Flow  model,  converges  to  a  Nash  solution  that  generally  improves  the  value  of  the  payoff 
function. 

The  Game  Flow  model  simulates  a  two-force  game  where  the  assets  of  each  force,  say  the  blue  or  red 
forces,  are  distributed  over  a  large  geographical  area. 

In  this  experiment,  the  game  area  was  a  square  divided  into  64  square  cells.  At  the  start  of  the  game, 
the  two  forces  were  spread  uniformly  over  the  entire  game  area,  but  the  total  strength  of  the  blue  force 
was  only  two  thirds  the  total  strength  of  the  red  force.  To  counter  this  mismatch,  the  attack  range  of  the 
blue  force  was  larger,  and  the  cost  of  movement  for  the  blue  force  was  lower  than  that  of  the  red  force. 

The  goal  of  each  force  was  to  reach  the  end  of  the  game  with  a  minimum  loss  of  their  own  strength, 
while  inflicting  maximum  damage  to  the  opposing  force.  Also,  each  force  assigned  more  value  (larger 
weight)  to  the  cells  located  in  the  middle  of  the  game  area  than  to  the  cells  located  near  the  boundaries,  so 
higher  score  might  be  earned  by  finishing  the  game  with  heavier  strength  concentration  in  more  valuable 
cells.  Finally,  movement  of  assets  across  the  game  area  was  penalized,  so  economy  of  movement  was  also 
reflected  in  the  final  score  of  each  force. 

The  game  was  carried  out  for  a  specified  amount  of  time,  with  the  phases  of  the  game,  i.e.,  asset 
movement  and  attack,  evolving  uninterrupted  for  the  duration  of  the  game. 

The  SLQ  algorithm  was  used  to  find  a  Nash  equilibrium  solution  for  the  game.  In  this  experiment, 
the  solver  was  stopped  after  10  iterations,  when  the  error  (i.e.,  the  norm  of  the  velocity  updates)  was 
approximately  one  percent  of  the  original  error.  At  this  error  level,  further  iterations  had  an  insignificant 
effect  on  the  solution. 

Experimental  results  show  that  the  Nash  equilibrium  solution  found  by  the  SLQ  algorithm,  greatly 
improved  the  performance  of  the  two  forces  with  respect  to  the  value  of  the  payoff  function  selected  for 
this  experiment. 

Qualitatively  speaking,  we  can  say  that,  in  this  scenario,  the  superiority  of  the  blue  force  in  the  attack 
range,  and  its  lower  cost  on  movement  prevailed,  allowing  the  blue  force  to  keep  the  red  force  out  of  the 
most  valuable  cells  in  the  middle  of  the  game  area. 


12.2  Purpose  of  the  Experiment 

The  Game  Flow  model  simulates  a  two-force  game  where  the  assets  of  each  force,  say  the  blue  or  red  force, 
are  distributed  over  a  large  area.  The  game  area  is  divided  into  cells  so  that  the  strength  concentration 
of  the  blue  (resp.  red)  force  in  a  cell  is  defined  as  the  amount  of  blue  (resp.  red)  asset  of  a  single  type 
contained  in  the  cell  divided  by  the  area  of  the  cell.  The  blue  and  red  forces  can  move  their  respective 
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assets  continuously  during  the  game,  by  specifying  transport  velocities  for  each  pair  of  contiguous  cells, 
i.e.,  the  rate  at  which  the  assets  are  shifted  from  one  cell  to  the  next. 

At  the  start  of  a  game,  the  two  forces  are  assigned  an  initial  strength  distribution  over  all  the  cells  in 
the  game  area.  As  the  game  proceeds,  the  initial  strength  distributions  evolve  in  different  ways,  but  the 
total  strength  of  each  force  can  only  decrease  due  to  two  types  of  strength  loss  mechanisms. 

The  first  type  of  loss  mechanism  is  characterized  by  a  local  attrition  parameter  associated  with  each 
cell.  This  may  represent  loss  due  to  mechanical  failure  or  local  weather. 

The  second  type  of  loss  mechanism  for  one  force  represents  attacks  from  the  opposing  force.  Attacks 
are  carried  out  continuously  and  simultaneously  by  the  two  forces  during  a  war  game.  For  example,  the 
blue  assets  contained  in  one  cell  at  any  one  time  will  attack  simultaneously  all  the  red  assets  which  are 
at  that  time  in  all  the  cells  within  the  attack  range  of  the  blue  force.  The  actual  damage  sustained  by 
the  red  force  in  each  cell  will  depend  on  the  strength  concentration  of  the  blue  force  in  the  attacking  cell, 
the  strength  concentration  of  the  red  force  in  the  cell  that  is  being  attacked,  and  on  the  distance  between 
the  two  cells. 

The  game  is  carried  out  for  a  specified  amount  of  time,  with  the  three  phases  of  the  game,  i.e.,  asset 
movement,  attrition  and  attack,  evolving  uninterrupted  for  the  duration  of  the  game. 

The  goal  of  each  force  is  to  reach  the  end  of  the  game  with  a  minimum  loss  of  their  own  strength, 
while  inflicting  maximum  damage  to  the  opposing  force.  Also,  each  force  may  assign  more  value  (larger 
weight)  to  some  of  the  cells  in  the  game  area  than  to  other  cells,  so  a  higher  score  might  be  earned 
by  finishing  the  game  with  heavier  strength  concentration  in  more  valuable  cells.  Finally,  movement  of 
assets  across  the  game  area  typically  costs  valuable  energy,  so  economy  of  movement  is  also  reflected  in 
the  final  score  of  each  force. 

The  purpose  of  this  experiment  was  to  validate  the  Game  Flow  approach.  Validation  is  in  the  sense 
that  the  game  theoretic  solution  engine  (i.e.,  the  Sequential  Linear-Quadratic  algorithm),  acting  on  the 
the  Game  Flow  model,  converges  to  a  solution  that  generally  improves  the  value  of  the  game,  i.e.,  the 
payoff  function  value. 


12.3  Hypothesis  to  Prove  or  Disprove 

The  hypothesis  that  we  tried  to  prove  in  this  experiment  is:  The  Game  Flow  model  and  the  SLQ  algorithm 
constitute  a  feasible  tool  for  solving  differential  games  inv loving  a  very  large  number  of  opposing  units, 
distributed  over  a  geographical  area. 


12.4  Experiment  Setup 

We  apply  the  Game  Flow  method  to  find  a  solution  for  the  game  described  by  the  following  scenario. 

The  game  area  is  a  square  of  unit  length  in  each  side,  and  is  divided  into  64  squares  to  form  an  8  x  8 
grid. 

The  game  area  may  represent  some  geographical  area  where  the  conflict  takes  place.  It  should  be 
expected  then,  that  certain  local  features  of  the  game  area  will  have  an  effect  on  the  evolution  of  the 
game.  For  example,  the  energy  that  the  forces  must  spend  to  move  their  respective  assets  should  vary 
as  they  attempt  to  go  across  different  types  of  terrain:  dessert  dunes,  marshy  land,  dense  forests,  etc. 
Similarly,  different  features  of  the  game  area  might  affect  the  attrition  rate  sustained  by  the  forces. 

In  this  experiment,  the  game  area  consists  of  two  types  of  terrain:  one  smooth  area,  through  which 
movement  of  assets  is  relatively  easy,  surrounded  by  more  difficult  terrain.  A  map  of  the  game  area 
is  shown  in  Figure  12.1,  with  the  smooth  area  indicated  in  dark  color.  Notice  that  the  difficulty  in  the 
terrain  need  not  be  the  same  along  the  horizontal  and  vertical  directions,  even  locally.  In  this  experiment, 
for  example,  it  is  clear  that  to  go  from  cell  (4,4)  to  cell  (3,5),  the  path  that  goes  through  cell  (3,3)  is  less 
expensive  than  the  path  that  goes  through  cell  (3,5). 

To  simplify  visualization  of  the  experiment  results,  attrition  not  caused  by  the  enemy  is  not  included 
in  this  experiment. 
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Figure  12.1:  Map  of  the  game  area:  light  colored  cells  indicate  smooth  area.  The  white  contour  line 
marks  the  boundary  between  the  two  different  regions. 
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The  blue  force  has  an  initial  strength  of  128  units  spread  uniformly  on  the  game  area.  The  red  force 
has  an  initial  strength  of  192  units,  also  spread  uniformly  on  the  game  area. 

We  assume  that  each  force  has  a  symmetric  attack  efficiency  function  as  depicted  in  Figures  12.2  and 
12.3.  The  Figures  show  that  the  red  force  is  slightly  more  powerful  than  the  blue  force  at  close  range. 
On  the  other  hand,  the  blue  force  has  a  longer  range,  covering  an  area  of  approximately  3x3  cells. 
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Figure  12.2:  Efficiency  of  attack  for  the  blue  force. 


The  mission  for  the  blue  force  is  defined  as:  (a)  reach  the  end  of  the  game  with  as  much  strength  as 
possible;  (b)  place  as  many  assets  as  possible  in  the  four  cells  located  in  the  middle  of  the  game  area;  (c) 
remove  the  red  assets  from  the  four  central  cells,  and  block  any  attempts  by  the  red  force  to  move  its  own 
assets  into  that  area;  (d)  continuously  try  to  minimize  the  strength  of  the  red  force;  and  (e)  minimize 
the  control  effort  in  accomplishing  the  first  four  items  of  this  mission  statement. 

Similarly,  the  mission  for  the  red  force  is  defined  as:  (a)  reach  the  end  of  the  game  with  as  much 
strength  as  possible;  (b)  place  as  many  assets  as  possible  in  the  four  cells  located  in  the  middle  of  the 
game  area;  (c)  remove  the  blue  assets  from  the  four  central  cells,  and  block  any  attempts  by  the  blue 
force  to  move  its  own  assets  into  that  area;  (d)  continuously  try  to  minimize  the  strength  of  the  blue 
force;  and  (e)  minimize  the  control  effort  in  accomplishing  the  first  four  items  of  this  mission  statement. 

While  the  respective  mission  statements  for  the  blue  and  red  forces  may  look  the  same,  each  force 
can  assign  different  priorities  (weights)  to  the  different  mission  tasks.  For  instance,  in  this  experiment, 
the  cost  for  the  blue  force  associated  with  movement  of  assets  over  the  smooth  game  area  is  equivalent  to 
three  fourths  the  corresponding  cost  for  the  red  force.  This  means  that  the  blue  force  has  more  freedom 
of  movement  over  the  game  area  than  the  red  force. 

As  for  the  values  (weights)  assigned  by  each  force  to  the  different  cells  in  the  game  area,  the  respective 
mission  statements  imply  that  both  forces  regard  the  central  cells  more  valuable  than  the  cells  located 
near  the  edges  of  the  game  area.  For  this  experiment,  we  assume  that  the  two  forces  assign  equal  value 
to  each  cell,  so  that  a  common  map  of  the  real  estate  value  is  shown  for  the  two  forces  in  Figure  12.4. 
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Figure  12.3:  Efficiency  of  attack  for  the  red  force. 


0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

3 


Running  cost 


Terminal  cost 


Figure  12.4:  (a)  Running  cost  associated  with  instant  values  of  the  strength  concentration  in  the  cells, 

(b)  Terminal  cost  associated  with  final  values  of  the  strength  concentration  in  the  cells.  Lighter  color 
indicates  higher  value. 
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Table  12.1:  Payoff  function  value  for  the  initial  guessed  solution 


Force 

Running 

X-vei. 

Running 

Y-vel. 

Running 

Strength 

Terminal 

Strength 

Game 

Cost 

Blue 

0 

216 

-259.04 

-469.68 

-512.73 

Red 

-246 

0 

523.32 

961.43 

1238.76 

Total 

726.03 

12.5  Experiment  Results 

The  solution  technique  (SLQ  method)  is  iterative  and  it  improves  the  current  solution  estimate  at  each 
iteration.  Hence,  to  solve  the  game,  a  guess  has  to  be  made  for  the  velocity  distributions  in  the  horizontal 
and  vertical  directions  that  the  forces  continuously  apply.  The  initial  choice  of  the  velocity  distributions 
affect  the  rate  of  convergence  in  the  iterative  solution  of  the  game,  but  in  our  experiments  it  did  not 
have  significant  effects  on  the  final  outcome  of  the  game.  So,  in  the  experiment  we  report  here,  we 
arbitrarily  assigned  a  uniform  velocity  distribution  parallel  to  the  horizontal  direction  for  the  blue  force, 
and  a  uniform  velocity  distribution  parallel  to  the  vertical  direction  for  the  red  force. 

The  value  of  the  game  corresponding  to  the  initial  non-optimum  solution  is  shown  in  Table  12.1, 
broken  into  the  individual  cost  components. 

The  SLQ  algorithm  was  used  next  to  find  a  Nash  equilibrium  solution  for  the  game.  In  this  experiment, 
the  solver  was  stopped  after  10  iterations,  when  the  error  (i.e.,  the  norm  of  the  velocity  updates)  was 
approximately  one  percent  of  the  original  error.  At  this  error  level,  further  iterations  had  an  insignificant 
effect  on  the  solution.  Convergence  of  the  algorithm  is  illustrated  in  Figure  12.5,  which  shows  the  error 
against  iterations. 


Figure  12.5:  Convergence  of  SLQ  algorithm 

The  value  of  the  game  corresponding  to  the  Nash  equilibrium  solution  is  shown  in  Table  12.2,  broken 
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Table  12.2:  Payoff  function  value  for  the  Nash  equilibrium  solution 


Force 

Running 

X-vel. 

Running 

Y-vel. 

Running 

Strength 

Terminal 

Strength 

Game 

Cost 

Blue 

14.66 

10.89 

-264.24 

-497.18 

-735.86 

Red 

-18.99 

-20.06 

518.90 

1013.80 

1493.66 

Total 

757.80 

into  the  individual  cost  components. 

Clearly,  the  Nash  equilibrium  solution  found  by  the  SLQ  algorithm,  greatly  improves  the  performance 
of  the  two  forces  in  terms  of  the  value  of  the  payoff  function  selected  for  this  experiment.  Recall  that  t 
he  blue  force  tries  to  minimize  the  total  payoff  while  the  red  force  tries  to  maximize  it. 

Figures  12.6  and  12.7  show  the  initial  strength  distributions  for  the  blue  and  red  forces.  The  distri¬ 
butions  are  identical  and  uniform,  so  the  shade  (color)  is  uniform  over  the  area.  The  direction  of  asset 
movement  across  the  border  between  each  pair  of  cells  is  indicated  by  an  arrow.  The  size  of  the  arrows 
indicate  the  magnitudes  of  the  initial  velocity  components.  The  contour  lines  mark  the  different  regions 
in  the  game  area  as  defined  by  the  cost  components. 


Initial  Blue  strength  distribution 
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Figure  12.6:  Initial  Strength  Distribution  for  Blue  force.  Arrows  indicate  magnitude  of  the  velocity 
components  across  the  boundaries. 

Figures  12.8  and  12.9  show  the  final  strength  distributions  for  the  blue  and  red  forces.  The  arrows 
indicate  the  magnitudes  of  the  velocity  components  as  the  respective  assets  of  the  two  forces  reached 
their  final  destinations. 

The  final  total  strength  for  the  blue  force  was  20.5,  while  the  final  total  strength  for  the  red  force  was 
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Figure  12.8:  Final  Strength  Distribution  for  Blue  force.  Arrows  indicate  magnitude  of  the  velocity 
components  across  the  boundaries. 
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Final  Red  strength  distribution 
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Figure  12.9:  Final  Strength  Distribution  for  Red  force.  Arrows  indicate  magnitude  of  the  velocity 
components  across  the  boundaries. 
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29.5.  Therefore,  the  red  force  conserved  only  15.4%  of  its  original  strength,  while  the  blue  force  conserved 
16%  of  its  own  strength. 


12.6  Analysis 

Qualitatively  speaking,  we  can  say  that,  in  this  scenario,  the  superiority  of  the  blue  force  in  the  attack 
range  prevailed,  allowing  the  blue  force  to  keep  the  red  force  out  of  the  most  valuable  cells  in  the  middle 
of  the  board.  Indeed,  the  red  force  was  forced  to  retreat  into  the  four  corners  of  the  game  area  where  less 
valuable  cells  were  to  be  found.  It  is  also  clear  that  different  transportation  costs  assigned  to  different 
regions  affected  the  way  in  which  the  blue  and  red  forces  adjusted  their  strength  concentrations  during 
the  game.  This  can  be  seen  in  the  fact  that  the  red’s  concentration  in  the  (1,1)  corner  is  weaker  than  the 
other  three  corners  because  the  terrain  near  (1,1)  is  harder  to  traverse. 


12.7  Conclusions  and  Recommendations 

It  was  difficult  to  find  a  scenario  which  was  both  interesting,  in  the  sense  that  some  significant  amount 
of  action  occurred  during  the  game,  and  in  which  a  game  theoretic  solution  could  be  found  by  the  SLQ 
algorithm.  In  most  cases  in  which  a  solution  could  be  found,  the  velocity  terms  had  to  dominate  the 
payoff  function  so  that  little  movement  of  assets  resulted.  This  was  particularly  the  case  when  the  two 
forces  had  an  initial  strength  distribution  concentrated  in  a  small  region  of  the  game  area.  However,  it 
is  still  not  clear  at  this  time  what  features  are  most  critical  in  determining  whether  a  given  scenario  has 
an  SLQ  solution  or  not. 

With  respect  to  computational  complexity,  the  Game  Flow  solution  engine  executes  relatively  fast, 
using  a  combination  of  Matlab  built-in  functions  (e.g.,  ODE  solvers)  and  custom  made  C+4-  routines. 
For  example,  the  CPU  time  required  to  run  this  experiment  (8x8  grid)  was  113.5  seconds.  We  also 
solved  the  same  scenario,  but  using  a  10  x  10  grid.  This  represents  an  increase  of  60%  in  the  size  of  its 
computational  work.  The  algorithm  converged  and  the  results  were  similar  to  the  ones  reported  here. 
The  CPU  time  in  this  case  was  273.6  seconds,  which  means  a  141%  increase  over  the  original  experiment. 
These  results  were  obtained  running  the  Game  Flow  program  on  a  800  MHz  PC  with  500  MB  RAM.  It 
should  be  stated,  that  the  actual  CPU  time  required  to  run  different  scenarios  may  vary  considerably 
from  these  results. 

In  the  current  version  of  the  Game  Flow  model,  physical  features  of  the  theater  itself  can  affect  the 
dynamics  in  two  ways:  i)  in  the  rate  of  attrition  of  the  human  and/or  mechanical  assets  in  the  field;  ii)  in 
the  energy  cost  associated  with  the  movement  of  assets  in  the  field.  The  last  feature  is  not  implemented 
directly  in  the  the  differential  equations  of  the  system.  Instead,  it  is  represented  in  the  payoff  functions  of 
the  two  forces.  A  velocity  reduction  parameter,  similar  to  an  attrition  parameter  could  be  implemented 
in  future  versions  of  the  Game  Flow  model. 

Another  characteristic  of  the  current  version  of  the  Game  Flow  model  is  that  physical  features  of 
the  theater  have  no  effect  on  the  efficiency  of  attack.  Hence,  the  enemy  cannot  hide  behind  a  mountain 
range,  for  instance.  Also  the  efficiency  of  attack  functions  can  have  a  directionality  associated  with  them 
before  the  beginning  of  a  game,  but  these  cannot  be  rotated  or  reoriented  as  the  game  evolves.  This 
could  be  an  important  addition  to  enhance  the  strategic  capabilities  of  the  Game  Flow  model.  A  similar 
concept  regarding  shields,  that  would  protect  the  assets  positioned  behind  them,  might  conceivably  be 
incorporated  as  well. 

It  is  hoped  that  by  adding  complexity  to  the  model,  the  class  of  interesting  problems  that  can  be 
solved  with  the  Game  Flow  program  would  be  enlarged. 
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Chapter  13 


Experiment  13:  Discrete  Platform 
Dynamics 

13.1  Executive  Summary 

In  any  type  of  battlefield,  the  loss  of  platforms  is  usually  a  stochastic  discrete  event  over  time.  In 
JFACC  simulations,  however,  the  number  of  platforms  has  been  modeled  as  a  real  number  representing 
its  probabilistic  expectation.  In  other  words,  our  game-theoretic  controller  based  on  an  expected-value 
model  needs  to  be  tested  on  a  more  realistic  plant,  in  which  the  numbers  of  platforms  are  integers.  Our 
approach  was  to  first  develop  a  model  such  that  the  dynamics  of  the  number  of  platforms  is  a  stochastic 
discrete-event  equation,  i.e.,  in  our  new  stochastic  discrete-event  model,  the  number  of  platforms  in  each 
unit  is  an  integer.  Hence,  the  number  of  platforms  changes  from  10  to  9  at  one  point  and  then  on  to 
eight  platforms  later  based  on  the  probability  of  kill.  Moreover,  the  numbers  of  platforms  vary  differently 
for  different  runs  due  to  random  number  generators,  which  control  the  time  when  an  actual  kill  occurs. 
Using  this  new  model,  we  conducted  multiple  runs  and  took  an  average.  This  average  was  then  compared 
against  the  results  based  on  the  expected- value  model.  We  concluded  that  our  game- theoretical  controller 
(based  on  the  simpler  expected- value  model)  performed  just  as  well  when  tested  on  this  more  realistic 
stochastic  discrete-event  plant  model  as  when  tested  on  the  expected- value  plant  model. 
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13.2  Introduction 


In  the  JFACC  simulations,  the  number  of  platforms  is  modeled  as  a  real  number  representing  its  proba¬ 
bilistic  expectation.  Here  we  will  investigate  the  effect  of  this  assumption  on  the  game-theoretic  controller. 
First,  in  Section  13.3,  we  form  a  hypothesis.  Then,  in  Section  13.4  we  describe  a  stochastic  discrete-event 
model,  where  the  number  of  platforms  in  each  unit  is  an  integer  and  varies  differently  for  different  runs 
due  to  random  number  generation.  Because  of  the  randomness,  we  needed  to  make  multiple  runs  and 
take  an  average  in  order  to  analyze  the  effectiveness  of  the  game- theoretical  controller.  This  experiment 
and  the  methods  are  fully  described  in  Section  13.5.  The  average  of  multiple  runs  is  compared  against 
the  expected  value  results  and  we  form  a  conclusion  in  Section  13.6. 

13.3  Hypothesis  to  Prove 

The  hypothesis  is  that  we  will  not  find  any  notable  differences  in  the  controller  performance  when  the 
controller  based  on  the  expected-value  model  is  applied  to  the  more  realistic  stochastic  discrete-event 
model. 


13.4  Stochastic  Discrete  Model  Description 

Note:  In  the  following  derivation,  we  proceed  with  a  generic  unit  without  specifying  Blue  or  Red  since  the 
model  is  identical  for  both  teams.  However,  in  its  programming  implementation,  the  equation  for  the  Red 
and  Blue  forces  are  written  separately. 


As  described  in  the  report  [1]  on  the  Mission  Dynamics  Continuous-time  Model  (MDCM),  the  dy¬ 
namics  for  the  number  rj  of  platforms  are  given  by 

V  =  -Afj,  (13.1) 

Here  on  the  Mission  Dynamics  Continuous-time  Model  (MDCM)  g  —  and  A  is  defined  by 

A (t)  d-  pPk(f> 7T,  (13.2) 

where  p  is  the  acquisition  rate,  Pk  is  the  probability  of  kill,  0  is  a  function  dependent  on  the  distance 
between  the  units,  and  n  is  the  fire  intensity. 

To  approximate  the  model,  we  use  the  forward  difference  approximation 

1,(1 +  ),,(<)■ 

Then  the  approximate  number  of  platforms  lost  over  a  At  time  interval  is 

g(t  +  At)  -  g(t)  «  X(t)g(t)At. 


To  take  advantage  of  this  approximation  we  must  assume  that  At  is  small,  furthermore  we  will  want  to 
choose  At  such  that 

snp{X(t)g(t)At]  <  1. 

t 

With  this  assumption,  we  view  X(t)g(t)At  as  the  probability  of  losing  a  platform  in  the  interval  [t,  t  +  At]. 
Now  define  R  as  a  uniformly  distributed  random  number  between  zero  and  one.  Then  if  R  <  X(t)g(t)At, 
a  platform  is  lost  in  the  interval  of  At.  We  can  now  define  the  stochastic  discrete  event  dynamics  for  the 
number  of  platforms  as 

g(t)  -  1  if  R  <  X(t)g(t)At  1 
g(t)  else  J 


g(t  +  At) 


(13.3) 


where  R  changes  with  each  update  of  the  discrete  system,  that  is  we  compute  a  new  random  number  R 
at  each  At  update. 
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13.5  Experiment  and  Methods 

To  test  the  new  model,  we  implement  the  discrete  dynamics  for  the  number  of  platforms  into  the  plant 
of  the  MDCM  model  while  the  internal  model  for  the  controller  remains  MDCM.  We  call  the  new  model 
with  discrete  dynamics:  Mission  Dynamics  Continuous  Model-  Stochastic  Discrete  (MDCM-SD).  The 
scenario  used  is  the  cross  11  scenario,  which  is  summarized  in  Table  13.1.  The  weights  in  the  table  are 
for  the  quadratic  cost  function  for  the  nonlinear  game. 


cross  11 

Blue 

Red 

Number  of  Units 

1.0 

1.0 

Number  of  Platforms 

10.0 

10.0 

pk 

0.8 

0.8 

P 

0.5 

0.5 

£^(0)(km) 

80.0 

50.0 

£(2)(0)(km) 

50.0 

20.0 

Weight:  Distance  to  Target  Cost 

0.1 

0.1 

Weight:  Running  Platform  Cost 

0.01 

3.0 

Weight:  Speed  Cost 

200.0 

200.0 

Weight:  Terminal  Platform  Cost 

0.0 

0.0 

Weight:  Terminal  Target  Cost 

0.0 

0.0 

Weight:  Terminal  Speed  Cost 

0.0 

0.0 

Table  13.1:  Cross  11  Scenario  description 


Next,  to  show  that  the  plant  dynamics  are  indeed  discrete,  and  the  controller  acts  effectively,  we 
conduct  one  sample  run  as  illustrated  in  Figures  13.1-13.5.  We  can  see  in  Figure  13.2  that  the  dynamics 
of  MDCM-SD  are  in  fact  discrete.  Through  one  run,  we  see  that  the  Red  dynamics  for  the  number  of 
platforms  match  closely  with  the  MDCM  model  but  the  Blue  dynamics  for  the  number  of  platforms  do 
not,  and  for  both  Blue  and  Red,  the  discrete  dynamics  lag  behind  the  continuous  dynamics.  This  can  be 
explained  by  examining  how  we  approximated  the  continuous  model.  Since  the  approximation  requires 
a  forward  difference  approximation,  we  will  have  a  delay  “reaction”  of  the  discrete  dynamics.  This  is 
because  the  current  update  of  the  discrete  equation  depends  on  the  previous  time  (see  Equation  13.3). 
Yet,  as  we  decrease  At,  the  lag  should  become  less  noticeable. 

To  understand  how  well  the  controller  performs  on  the  MDCM-SD  model,  compared  to  the  MDCM 
model,  we  need  to  run  the  MDCM-SD  simulation  multiple  times  and  take  an  average  since  the  dis¬ 
crete  dynamics  depend  on  the  generation  of  random  numbers .  We  have  simulated  and  compared  the 
trajectories  of  MDCM-SD  and  MDCM  over  a  hundred  sample  runs  and  over  a  range  of  update  times, 
At  =  0.001,0.01  and  1.0  min.  The  comparisons  are  shown  in  Figures  13.6-13.20.  Notice  that,  by  taking 
the  average  over  a  hundred  sample  runs,  we  are  producing  an  approximate  continuous  dynamic  equation 
to  the  actual  continuous  dynamics. 

In  all  the  comparisons,  and  most  importantly  the  dynamics  for  the  number  of  platforms  comparisons, 
we  do  not  see  much  dependence  on  the  update  times  At  we  chose.  This  is  good  since  smaller  update  times 
will  require  more  computation  time,  although  the  differences  in  computation  time  range  on  a  magnitude 
of  a  few  minutes. 


13.6  Conclusion 

We  have  shown  that  the  game  controller  performs  as  well  on  the  more  realistic  stochastic  descrete-event 
battlefield  plant  as  on  the  less  realistic  expected- value  battlefield  plant.  Further  experiments,  using  a  less 
trivial  scenario,  are  currently  under  way  at  Washington  University  and  preliminary  results  have  shown 
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that  the  controller  performance  remains  unaffected;  though  we  were  unable  to  include  a  complete  analysis 
at  the  time  this  report  was  written. 
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B1:blu,  R1:red 


Figure  13.5:  Comparison  of  MDCM  (-  -)  and  MDCM-SD  (-)  for  Speed  (One  sample  run) 


MDCM 

MDCM-SD 

Vb(T) 

7.47 

9 

Vh(T) 

4.18 

4 

Game  Cost 

-4545.5 

-4631.0 

Table  13.2:  Summary  of  Results  for  One  Sample  Run  with  At  =  0.1. 


Figure  13.6:  Comparison  of  MDCM  (-  -)  and  Averaged  MDCM-SD  (-)  for  Game  Trajectories  (At 
0.001). 


MDCM 

MDCM-SD 

VU{T) 

7.47 

7.87 

TW  1 

4.18 

4.09 

Game  Cost 

-4545.5 

-4507.1 

Table  13.3:  Summary  of  Results  Averaged  over  100  Sample  Runs  for  At  =  0.00 1. 


Figure  13.11:  Comparison  of  MDCM  (-  -)  and  Averaged  MDCM-SD  (~~)  for  Game  Trajectories  (At 
0.01). 


MDCM 

MDCM-SD 

rt*(T) 

7.47 

7.78 

v'Ht) 

4.18 

4.18 

Game  Cost 

-4545.5 

-4499.5 

Table  13.4:  Summary  of  Results  Averaged  over  100  Sample  Runs  for  At  —  0.01. 


Figure  13.15:  Comparison  of  MDCM  (-  -)  and  Averaged  MDCM-SD  (-)  for  Speed  (, 
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Chapter  14 

Experiment  14:  Non-linear  Detector 
for  the  Fully  Non-Linear  Model 

14.1  Executive  Summary 

In  Chapters  9  and  10  we  have  reported  the  results  of  experiments  performed  to  test  the  effectiveness 
of  a  “game- theoretic-optimal”  detection  filter  to  process  noise-corrupted  observations  of  the  battlefield. 
In  those  series  of  experiments,  a  bilinear  approximation  of  the  non-linear  model  of  the  battlefield  was 
considered  and  the  filter  was  designed  accordingly.  When  the  fully  non-linear  model  of  the  battlefield 
is  considered,  a  different  (non-linear)  detection  filter  must  be  designed.  The  purpose  of  this  Chapter 
is  to  present  the  experimental  results  concerning  the  non-linear  filter  and  to  compare  them  with  those 
obtained  by  using  the  detection  filter  designed  on  the  basis  of  the  bilinear  model  of  the  battlefield.  For 
the  sake  of  simplicity,  the  case  of  noise-free  measurements  will  be  considered  in  this  series  of  experiments. 


14.2  Purpose  of  the  Experiment 


This  section  of  the  report  describes  experiments  on  detection  and  isolation  of  multiple  enemy  actions 
in  a  battlefield.  The  mathematical  description  of  the  battlefield  used  here  is  the  one  introduced  in  the 
Appendix  1.12  of  Chapter  1  and  there  validated  under  the  “uncoordinated  target  selection,  independent 
target  acquisition”  assumption.  We  consider  the  case  in  which  two  opposing  forces  are  present  in  the 
theater  of  operations,  the  Blue  force  (the  “friends”)  and  the  Red  force  (the  “enemies”).  Each  force 
consists  of  two  units  and  each  unit  consists  of  a  number  of  platforms  whose  evolution  in  time  is  described 
by  a  first  order  nonlinear  differential  equation  and  depends  on  the  “actions”  which  the  opposing  units 
are  performing  against  the  unit  in  question.  If  any  “new”  action  is  performed  by  any  of  the  opposing 
units,  this  affects  the  evolution  of  the  number  of  platforms  of  the  other  force’s  units.  Each  unit’s  location 
is  represented  by  a  point  on  the  plane  and  its  motion  is  described  by  an  ordinary  differential  equation 
depending  on  speed  control  inputs.  Denoted  with  =  col(£^,£^)  e  M2  and  =  col(£fi5f^)  e  E2, 
with  i  =  l,2,  the  position  vectors  of  the  Red  and  -  respectively  -  Blue  units  on  the  plane,  the  equations 
of  the  motion  in  question  are  1 


=  aiV*W 

= 

= 

1The  reader  is  referred  to  Chapter  1  for  a  definition  of  the  parameters  appearing  in  the  equations  (14.1)  and  (14.2). 
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where  /zf  =  col (/z^, /z^)  6  M2  and  /if  =  col(/if , /if )  €  K2  are  the  speed  control  inputs  of  the  Red  and 
Blue  units,  respectively.  Letting  7?f  and  r/f,  with  2  =  1,2,  denote  the  number  of  platforms  of  the  2-th 
Red  and  -  respectively  -  2-th  Blue  unit,  the  model  of  evolution  of  the  number  of  platforms  is  a  four- 
dimensional  nonlinear  system  described  by  two  pairs  of  equations  of  the  form  (cf.  Chapter  1,  formula 
(1.26)) 


_d 
d t 


j-  1 

2 


n?(t)(aR  +  (Opfi- P&fi (ICf  M  -  • 


3  =  1 


(14.2) 


In  these  equations,  7rf  (•)  and  7rf  (•)  are  (independent)  input  variables  representing  the  “level  of  en¬ 
gagement”  of  the  j-th  Red  unit  with  the  2-th  Blue  unit  and  -  respectively  -  of  the  j-th  Blue  unit  with 
the  z-th  Red  unit.  For  convenience,  we  suppose  in  all  our  experiments  that 


*fi(<)  =  ==  (t),  7r^(t)  =  7if2(*)  =:  tt f  (t)  . 


This  means,  in  the  terminology  of  [1],  page  2,  that  we  allow  the  “unique  target  constraint”  to  be  violated 
(for  the  Red  units  only).  In  spite  of  what  has  been  assumed  for  the  experiments  which  have  been  reported 
in  Chapters  9  and  10,  in  this  series  of  experiments  it  will  be  considered  the  case  in  which  the  effect  of 
the  action  performed  by  a  unit  on  the  number  of  platforms  of  an  opposing  unit  depends  on  a  measure 
of  the  distance  between  the  two  units  through  the  function  if.  Following  Chapter  1,  we  take  as  if  an 
exponential  function  which  depends  on  the  measure  r  of  the  distance  between  the  two  units,  namely 

if(r)  —  exp~r/r°  , 


with  tq  a  suitable  scalar  parameter.  If  are  the  vectors  denoting  the  position  of  two  units,  their 
distance  r  is  chosen  equal  to  the  quantity  | &  —  fj |oo  =  max(|£;x  -  £JX|,  \£iy  —  £jy\),  which  is  the  oo-norm 
of  the  vector  &  —  .  For  simplicity,  we  drop  henceforth  the  subscript  oo  to  denote  such  norm. 

The  basic  problem  addressed  in  our  series  of  experiments  on  the  design  of  filters  for  the  detection  of 
enemy  actions  is  the  following  one:  we  monitor  only  the  position,  the  number  of  platforms  and  speed  control 
inputs  of  the  two  Blue  units  (i.e.  we  measure  only  the  values  of  the  four  state  variables  r/f ,  7$  and 

of  the  inputs  /z^,  /z^)  and  we  want  to  detect  the  occurrence  of  an  “ engagement  action”  from  either  one 
of  the  two  Red  units  (i.e,  we  want  to  detect  when  either  one  of  the  two  input  signals  7r/*(*)  has  become 
nonzero).  Implicit  in  this  is  the  assumption  that  the  four  other  state  variables  £>i^2^7li'^rl2  (number 
of  platforms  of  the  two  enemy  units)  as  well  as  all  the  input  variables  7T^,  i,j  —  1,2,  /z;^,  /z^  and  7r/*, 
i  —  1,2,  are  not  monitored.  The  purpose  of  the  detection  process  is  precisely  the  determination  of  when 
either  7rf  or  ntf  has  become  nonzero,  without  having  it  directly  measured. 


14.3  Hypothesis  to  Prove  or  Disprove 

Let  us  consider  the  systems  (14.1)  and  (14.2),  where,  for  the  sake  of  notational  simplicity,  the  latter  is 
rewritten  as 

=  -aBr]f{t)  -  7n»?f  (OV'GCfW  -  Cf  (0I)^(<) 

=  -<xBV2(t)  -  7?i»7? (OVKIffW  -  (*)l)*2i(0  -  7B2»?f(01/'(l^2i(<)  -  £?(*)l)7If2(*) 

(t)  =  —aRr)i  (t)  -  7n<(£)V’(|£f  (0  -  Si  (*)|  )*f  (t)  -  7i2»72iWlHl£f  (<)  -  ffWIKa  (<) 

(()  =  —aRr)2  (t)  -  'r&Vl'iVi’iltfit)  -  (Ol)wf  (t)  -  72R2*?2iWV’(l£23 (0  -  ?2i(t)l)7rB(f)  > 

(14.3) 

with  7/^,7^,  2,  j  =  1,2,  suitable  parameters.  As  in  the  equations  above  the  two  actions  to  detect  7rf  and 
Ttff  appear  multiplied  by  terms  which  depend  on  the  un- measured  functions  r/(*,  77^,  and  the  main 
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challenge  in  this  problem  is  to  distinguish  not  only  the  two  actions  7rf ,  71-^  from  each  other,  but  also  from 
the  “disturbance”  action  of  7?f ,  77^ ,  and  £*.  In  particular,  according  to  the  model  in  (14.3),  the  effect 
of  an  “engagement  action”  by  a  unit  on  the  evolution  of  the  number  of  platforms  of  an  opposing  unit  will 
depend  on  their  positions.  Since  the  number  of  platforms  of  the  Blue  units  is  processed  by  the  filter,  its 
capability  to  detect  and  isolate  actions  will  be  related  to  the  location  of  the  units.  We  fix  geographical 
areas  for  the  Red  and  the  Blue  units  in  the  theater  of  operation,  and  assume  that  the  units  are  moving  in 
those  zones.  In  this  assumption,  we  design  a  detection  filter  following  the  methods  of  [2],  [3].  This  filter 
receives  as  inputs  the  four  observed  variables  ,£2  ill  ,12  and  the  measured  inputs  /x^,  i  —  1,2, 
and  generates  as  outputs  two  signals  ri,  7*2,  called  performance  signals  (typically  known  also  with  the 
name  of  residuals ),  in  such  a  way  that  ri(t)  is  zero  if  the  Red  unit  i  is  not  engaged  with  the  Blue  units  at 
time  t  (i.e.  if  n^(t)  =  0),  and  that  rj(£)  is  nonzero  if  the  Red  unit  i  is  engaged  with  the  Blue  units  (i.e. 
if  7 r^(t)  7^  0),  no  matter  what  the  locations  of  the  four  units  in  the  assigned  areas  are.  Specifically,  this 
filter  is  modeled  by  equations  of  the  form 


Vi(t)  = 

hit)  = 

n(t)  =  Viit) 

r2(t)  =  rfi(t) 


\t) 

"B(0 


j&miM 

722  V’d^fWI) 


a  Vl(t)  75d*U(K2B(0l)J%()  9liVl 
m{  )  yfidt  {t)+ 92(7,2  ^  y&mfm 


tB  (±\ 


(14.4) 


7f2  Wl) 


m  (t)  -hit), 


721 


7n  m?m 


v?  it) -hit). 


These  equations  contain  terms  which  depend  on  the  positions  of  the  Blue  units  and  on  their  rate  of 
change,  which  are  quantities  available  for  measurements.  In  order  to  compare  these  equations  with  those 
which  describe  the  linear  filter  (9.3)  (or  10.3)  in  Chapters  9  (or  10),  one  can  observe  that  the  second 
term  on  the  right-hand  side  of  the  first  two  equations  in  (14.4)  is  taken  identically  zero  in  the  linear  filter, 
while  the  ratio  V>(|ff  i)AKIf?l)  (or  VKIfaPDAKIffl))  ls  taken  equal  to  1.  The  parameters  51, £2  are  “gain 
parameters”  to  be  designed.  In  the  problem  considered  in  Chapter  9  (or  10)  the  design  of  g\  and  £2 
was  critical  in  order  to  obtain  a  filter  which  was  able  to  selectively  reduce  the  effect  of  the  measurement 
noise  while  not  attenuating  the  signal  associated  with  the  action  to  detect.  Since  we  consider  the  case  of 
noise- free  measurements  for  this  series  of  experiments,  it  is  enough  to  design  g\  and  g<i  so  as  to  guarantee 
the  stability  of  the  filter. 


14.4  Experiment  Setup 

The  equations  which  define  filter  (14.4)  depend  on  the  location  of  the  Blue  Units.  To  test  the  effectiveness 
of  the  detection  filter  we  fix  geographical  areas  in  the  theater  of  operations  in  which  the  motion  of  the 
Blue  and  Red  Units  can  evolve  (see  Figure  14.1).  We  consider  the  case  in  which  the  two  Red  Units  are 
allowed  to  move  in  the  red-dashed  area  in  Figure  14.1,  whereas  the  two  Blue  Units  can  evolve  in  the 
blue-dashed  area.  We  note  explicitly  that,  although  the  region  where  the  Red  units  are  allowed  to  move 
is  known,  we  do  not  know  the  positions  and  ^  of  the  two  Red  units.  The  four  units  will  evolve  along 
trajectories  confined  in  the  areas  introduced  before  and  according  to  the  law  (14.1).  An  example  of  such 
trajectories  is  depicted  in  Figure  14.2.  The  evolution  of  the  number  of  platforms  is  modeled  as  in  equation 
(14.2).  The  inputs  variables,  representing  the  level  of  engagement  of  the  battling  units,  are  functions  of 
time  which  vary  with  different  scenarios.  For  instance,  in  the  first  experiment  we  consider,  the  two  levels 
of  engagement  of  the  Red  units  1  and  2  versus  the  Blue  units  vary  with  time  as  shown  in  Figure  14.3, 
where  “Action  1”  represents  the  level  of  engagement  of  Red  unit  1  and  “Action  2”  represents  the  level  of 
engagement  of  Red  unit  2.  Note  that  the  first  action  occurs  at  t  =  40  units  of  time,  whereas  the  second 
action  takes  place  at  t  =  25  units  of  time  and  that  the  first  action  takes  place  while  the  first  action  is  still 
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Figure  14.3:  Engagement  actions  of  the  Red  units  versus  the  Blue  units  in  Experiment  1. 

occurring.  The  corresponding  evolution  in  time  of  the  number  of  Red  and  Blue  platforms  for  each  unit 
is  plotted  in  Figure  14.4.  We  remind  that  the  number  of  platforms  of  the  Blue  units  is  measured  and  fed 
into  the  detection  filter,  along  with  the  measurement  of  their  positions  and  speed  control  inputs.  The 
filter  generates  the  two  outputs  r  1,7*2  according  to  the  equations  (14.4)  which  must  reveal  the  occurrence 
of  Action  1  and,  respectively,  Action  2. 


14.5  Example  of  Experiment 

For  the  first  experiment  (henceforth,  referred  as  “Experiment  1”),  we  consider  the  scenario  in  which  the 
four  units  present  in  the  battlefield  are  moving  following  the  trajectories  in  Figure  14.2.  The  two  Blue 
units  are  subject  to  the  engagement  actions  depicted  in  Figure  14.3  due  to  the  Red  units.  The  measured 
number  of  platforms  of  the  Blue  units  -  which  are  processed  by  the  detection  filter  -  evolves  according 
to  the  time  behaviour  in  the  plot  on  the  left  side  of  Figure  14.4.  Although  a  change  in  the  profile  of  the 
two  graphs  can  be  noticed  at  time  t  —  25  units  of  time  and  t  =  40  units  of  time  -  denoting  an  increased 
level  of  engagement  due  to  the  action  of  the  Red  units  -  it  cannot  be  inferred  from  these  graphs  which 
one  of  the  two  units  is  actually  increasing  its  level  of  engagement  versus  the  opposing  units.  As  a  matter 
of  fact,  only  the  outputs  of  the  detection  filter  can  clearly  reveal  the  occurrence  of  the  two  Red  actions, 
by  distinguishing  both  of  them.  Figure  14.5  shows  the  time  profile  of  the  action  to  detect  (Action  1)  and 
of  the  Performance  Signal  1  generated  by  the  detector  (r*i).  It  is  seen  that  the  performance  signal  decays 
to  zero  after  a  transient  behavior,  does  not  “react”  to  the  occurrence  of  Action  2,  and  becomes  evidently 
nonzero  only  when  Action  1  occurs.  In  this  way,  it  is  possible  to  infer  the  occurrence  of  Action  1  without 
confusing  it  with  the  occurrence  of  Action  2.  Analogously,  Figure  14.6  shows  how  the  Performance 
Signal  2  allows  to  detect  the  occurrence  of  Action  2.  For  the  sake  of  completeness,  the  evolution  of 
the  internal  states  of  the  detection  filter  is  reported  in  Figure  14.7.  In  order  to  assess  the  behaviour  of 
the  non-linear  filter,  the  time  profiles  of  the  two  performance  signals  generated  by  the  linear  detection 
filter  are  illustrated  in  the  next  two  figures.  Namely,  the  responses  of  the  Performance  Signal  1  and  2 
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Figure  14.8:  Actions  1  and  2  and  Performance  Signal  1  for  the  Linear  Detection  Filter  in  Experiment  1. 


of  the  linear  filter  are  given  in  Figure  14.8  and  14.9,  respectively.  Figure  14.8  evidently  shows  how  the 
“noninteractive”  property  of  the  filter  is  lost.  In  fact,  the  Performance  Signal  1  -  which  virtually  should 
be  zero  until  Action  1  occurs,  becomes  nonzero  in  response  to  the  occurrence  of  Action  2  at  time  t  =  25. 
This  is  due  to  the  nonlinear  terms  present  in  the  model  (14.2)  which  are  not  taken  into  account  by  the 
linear  detection  filter. 


14.6  Results  of  the  Experiments 

We  report  in  this  section  the  outcome  of  experiments  performed  varying  the  engagement  actions  of  the 
two  Red  Units  and  the  trajectories  followed  on  the  plane  by  the  Red  and  Blue  Units.  The  non-linear 
model  of  the  battlefield  depends  on  such  functions,  and  so  the  evolution  in  time  of  the  number  of  platforms 
of  the  Blue  units,  which  feed  the  non-linear  detection  filter.  It  will  be  seen  that,  despite  of  the  changes 
in  the  signals  which  drive  the  filter,  its  capabilities  of  detecting  and  isolating  the  two  Red  unit  actions 
are  unaltered. 

Consider  the  following  scenario  for  the  second  experiment.  As  in  the  experiment  in  the  previous  Section, 
the  four  units  present  in  the  battlefield  are  following  the  trajectories  in  Figure  14.2.  The  actions  of  the 
Red  Units  versus  the  Blue  Units  have  different  occurrence  times  with  respect  to  the  actions  considered 
in  the  Experiment  1.  Action  1  occurs  at  time  t  =  25  whereas  Action  2  occurs  at  time  t  =  40  (see  Figure 
14.10).  How  these  different  actions  change  the  shape  of  the  time  behaviour  of  the  number  of  platforms 
of  the  Blue  units  is  shown  in  Figure  14.11.  The  response  of  the  detection  filter  to  the  two  actions  are 
depicted  in  Figures  14.12,  14.13.  In  Figures  14.14,  14.15  are  reported  the  response  of  the  performance 
signals  of  the  linear  filter.  As  in  Experiment  1,  even  in  this  case  the  “non-interaction”  property  of  the 
filter  is  lost  due  to  the  nonlinear  terms.  In  particular,  the  Performance  Signal  2  erroneously  detects 
Action  2. 

The  next  experiment  is  aimed  to  validate  the  hypothesis  that  the  detector’s  performance  is  not  affected 
by  a  change  in  the  trajectories  of  the  units  in  the  battlefield.  Consider  the  case  in  which  the  four  units’ 
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Figure  14.15:  Actions  1  and  2  and  Performance  Signal  2  of  the  linear  filter  in  Experiment  2. 

motion  are  as  those  depicted  in  Figure  14.16.  The  evolution  of  the  number  of  platforms  of  Blue  Units  can 
be  compared  with  the  same  quantities  obtained  in  Experiment  1  in  Figure  14.17.  The  two  performance 
signals  generated  by  the  non-linear  detection  filter  are  given  in  Figures  14.18  and  14.19.  Figures  14.20 
and  14.21  show  the  performance  signals  of  the  linear  detection  filter. 

14.7  Conclusions  and  Recommendations 

The  detection  and  isolation  of  actions  of  the  “enemy”  units  versus  the  “friendly”  units  in  a  non-linear 
model  of  the  battlefield  requires  the  use  of  a  non-linear  detection  and  isolation  filter.  Since  the  non- 
linearities  amount  to  terms  which  depend  on  the  position  of  the  battling  units  in  the  field  of  operations, 
the  non-linear  filter  utilizes  as  inputs  the  position  and  the  speed  control  inputs  of  the  friendly  units  in 
addition  to  the  number  of  platforms  which  is  used  as  input  by  the  linear  detection  filter.  It  turns  out 
that  the  non-linear  detection  filter  detects  and  isolate  concurrent  actions  by  the  opposing  units  whereas 
the  linear  filter  does  not,  confounding  the  two  actions  to  detect.  Although  not  considered  in  this  series 
of  experiments,  the  case  of  noisy  observations  of  the  number  of  platforms  can  be  faced  similarly  to  what 
has  been  done  in  Chapters  9  or  10,  with  an  optimal  choice  of  the  gain  parameters  which  define  the 
detection  filter.  This  boils  down  to  the  solution  of  a  Riccati  equation  and  to  a  detection  filter  with  more 
computational  complexity. 
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Chapter  15 


Experiment  15:  Comparison  with 
Honeywell’s  Results 

15.1  Executive  Summary 

A  comparison  of  the  platform  loss  and  probability  of  success  values  is  made  between  Washington  Uni¬ 
versity  and  Honeywell  results  on  two  example  missions,  each  consisting  of  three  sorties.  The  results  are 
similar  in  the  first  example.  Due  to  a  change  in  the  initial  number  of  Red  fighters  and  their  probability 
of  kill,  the  outcome  of  the  second  example  is  drastically  different.  It  has  also  been  observed  that  the 
selection  of  weights  in  the  cost  function  may  affect  the  unit  trajectories  and  platform  loss  significantly. 

Despite  running  our  Sequential  Linear-Quadratic  Method  for  50  iterations  or  more,  convergence  to  a 
possible  Nash  solution  was  not  achieved  in  either  example,  although  the  obtained  unit  trajectories  and 
platform  loss  numbers  were  reasonable,  given  the  mission  objectives. 


15.2  Introduction 

The  purpose  of  this  experiment  is  to  compare  the  results  obtained  by  the  Honeywell  Team  with  those  of 
the  Washington  University  Team,  using  a  common  scenario  and  task  description. 

The  Honeywell  approach  is  based  on  a  discrete- transit  ion  Markov  chain  model  of  combat  between 
teams  of  Red  and  Blue  units,  and  describes  platform  attrition  during  a  sequence  of  sorties  to  accomplish 
a  given  mission  [1].  In  this  model,  target  selection  coordination  and  cooperation  between  friendly  units  are 
explicitly  taken  into  account,  but  terrain  features,  initial  unit  locations  and  the  routes  followed  by  units 
during  a  sortie  are  assumed  to  be  lumped  into  a  single  parameter,  called  the  “ lethality ”  of  a  particular 
type  of  unit  against  a  particular  type  of  enemy  unit.  This  parameter  determines  state  transitions  of  the 
number  of  platforms  during  the  mission. 

Honeywell’s  Model  Predictive  Controller  [2]  is  based  on  a  component  called  the  “Initial  Deployment 
Optimizer.”  Given  the  lethality  matrix,  the  number  of  platforms  and  decoys  in  each  Red  unit,  the 
maximum  number  of  rounds  (sorties)  in  the  mission  and  the  desired  probability  of  win,  this  component 
calculates  the  minimum  number  of  Blue  platforms  to  deploy  in  the  first  sortie,  assuming  that  all  survivors 
of  a  sortie  will  be  reassigned  to  the  next  one,  and  no  reserves  will  be  called  in  to  join  either  Red  or  Blue 
teams. 

The  Washington  University  model  starts  from  similar  arguments  of  target  acquisition  and  target 
selection  coordination,  derives  a  continuous-transition  Markov  chain  for  platform  attrition,  and  then 
approximates  the  evolution  of  the  expected  values  of  the  number  of  platforms  in  both  Red  and  Blue 
teams  by  a  low-order  ordinary  differential  equation.  This  is  combined  with  unit  motion  on  a  two- 
dimensional  battle  space  and  weapon  expenditure.  In  this  way,  the  location  and  motion  of  units,  the 
effect  of  distance  on  weapon  effectiveness  and  cooperation  of  friendly  units  are  taken  into  account.  The 
state  transition  of  the  number  of  platforms  is  determined  by  two  parameters:  the  target  acquisition  rate 
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and  the  probability  of  kill  for  a  particular  type  of  platform  against  a  particular  type  of  enemy  platform 
per  firing.  As  opposed  to  the  aggregate  concept  of  “lethality” ,  these  low-level  parameters  are  determined 
mainly  by  the  properties  of  the  search  devices  and  the  weapon  systems,  given  weather  conditions,  and 
thus  are  independent  of  the  engagement  rules,  the  strength  of  the  teams  or  the  synergy  between  friendly 
units. 

One  component  of  the  Washington  University  effort  is  the  calculation  of  the  game  theoretic  optimal 
control  (for  both  Red  and  Blue  teams)  using  the  Sequential  Linear-Quadratic  Method  (SLQM),  given 
the  initial  state  (number  of  platforms  and  weapons  in  the  units,  and  unit  locations),  and  a  cost  function 
which  encompasses  the  trade-off  between  accomplishing  the  mission  (e.g.,  reducing  the  number  of  enemy 
platforms),  the  value  of  friendly  assets,  fuel  and  weapon  consumption.  The  Nash  solution  (of  the  zero-sum 
differential  game)  computed  by  this  component  will  depend  on  how  the  weights  are  chosen  in  the  cost 
function. 

It  is  possible  to  combine  Honeywell’s  “Initial  Deployment  Optimizer”  and  Washington  University’s 
“Game  Theoretic  Tactical  Solution”  components,  in  an  iterative  loop  for  improving  lethality  estimates. 
This  idea  is  described  in  the  flowchart  in  Fig.  15.1. 

Before  this  idea  is  implemented,  it  is  useful  to  compare  the  results  of  the  Honeywell  and  Washington 
U.  models  and  controllers.  The  rest  of  this  report  consists  of  this  comparison  based  on  two  example 
scenarios  proposed  by  Honeywell. 


15.3  Experiment  Setup 

The  scenarios  used  for  the  comparison  are  best  described  by  the  following  quotation  from  [3]: 

“Blue  is  tasked  with  the  objective  to  destroy  a  ground  target  in  three  missions  (sorties)  or 
less.  On  its  way  to  the  target,  his  strike  package  will  encounter  Red’s  fighters  (...).  To  lower 
the  loss  of  his  bombers,  Blue  will  provide  a  few  escort  fighters  to  his  package.  After  each 
mission,  the  survivors  on  both  sides  return  to  their  bases,  where  they  are  fully  rearmed  and 
then  send  off  again  on  the  next  mission.  Successful  task  completion  is  defined  so  that  both 
the  target  must  be  destroyed  within  the  given  deadline  and  own  losses  must  not  exceed  a 
given  cap.  In  particular,  destroying  the  target  with  own  loss  exceeding  the  cap  is  considered 
a  failure.  The  task  is  over  whenever  the  ground  target  was  destroyed  (even  if  it  happens  in 
the  first  or  second  mission)  or  three  mission  have  been  flown  in  vain. 

Each  Blue  bomber  carries  a  payload,  whose  lethality  against  the  Red’s  ground  target  is  the 
first  number  in  the  lethality  matrix  element  (1,2).  (...)  For  self-defense,  it  has  cannons  whose 
lethality  against  the  Red  fighters  is  the  first  number  in  the  lethality  matrix  element  (1,1). 

Each  Blue  fighter  is  armed  with  4  A  A  missiles,  whose  lethalities  against  the  Red  assets  are 
given  by  the  first  numbers  of  the  second  row  elements  of  the  lethality  matrix.  Note  that  the 
fighters  has  no  weapons  against  the  ground  target. 

Each  Red  fighter  is  equipped  with  4  AA  missiles,  whose  lethalities  against  Blue  bombers  and 
fighters  are  the  second  numbers  of  the  first  column  elements  of  the  lethality  matrix. 

The  Red  ground  target  is  passive  and  cannot  shoot  back  at  the  Blue  package. 

Both  the  Blue  and  Red  fighters  fire  one  missile  at  a  time  without  target  selection  coordination 
with  their  fellow  fighters.  (...)  Likewise  do  the  bombers.  Furthermore,  the  Blue  fighters  do 
not  coordinate  their  target  selection  with  the  bombers  (The  inter-asset  coordination.).” 

In  our  experiments,  we  assume  that  the  Red  fighters  (R2)  are  targeting  Blue  bombers  (Bl)  only. 
Blue  bombers  (Bl)  are  dropping  bombs  on  the  Red  ground  target  (Rl)  and  Blue  fighters  (B2)  are  firing 
at  the  Red  fighters  (R2).  This  is  different  than  the  Honeywell  model  in  which  the  Blue  bombers  can  also 
fire  at  the  Red  fighters,  with  low  lethality. 

The  target  acquisition  rate  is  set  to  one  for  all  units,  and  the  lethality  numbers  in  [3]  are  used  as  the 
probability  of  kill  values  at  optimum  distance,  as  shown  in  Table  15.1.  Note  that  the  probability  of  kill 
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Table  15.1.  Probability  of  Kill  Values 


B1  on  R1 

B2  on  R2 

R2  on  B1 

Example  1 

0.2 

0.3 

0.4 

Example  2 

0.2 

0.6 

0.6 

Figure  15.2:  Initial  positions  of  the  units. 


decreases  with  distance  in  our  model.  These  are  the  only  parameters  that  are  varied  between  the  two 
examples. 

The  initial  number  of  platforms  in  each  unit  are  those  generated  by  Honeywell’s  component,  as  given 
in  [3],  except  that  the  Red  ground  unit  is  assumed  to  have  10  platforms  initially,  instead  of  a  single  target. 
In  this  way,  the  expected  number  of  platforms  in  the  Red  ground  unit  at  the  end  of  each  sortie  (loosely) 
correspond  to  ten  times  the  probability  of  failure  (one  minus  prob.  success) .  The  initial  positions  of  the 
units  are  chosen  as  shown  in  Fig.  15.2. 

The  trajectories  and  the  firing  intensities  of  the  units  in  the  Nash  solution  (in  fact,  the  existence  of  a 
Nash  solution)  are  determined  by  the  weights  in  the  cost  function.  For  our  experiments,  the  weights  in 
Table  15.2  are  used. 

Note  that  it  is  possible  to  obtain  very  different  trajectories  and  final  results  by  changing  the  above 
weights. 


15.4  Experiment  Results  and  Analysis 

The  Sequential  Linear-Quadratic  Method  (SLQM)  for  the  Nash  solution  computation  is  terminated  in  50 
iterations  for  all  sorties.  For  Example  1,  the  behavior  of  the  control  update  (6u)  and  the  cost  function, 
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Table  15.2.  Weights  in  the  Cost  Function 


B1 

B2 

Rl 

R2 

distance  from  destination 

IE-3 

0 

0 

0 

running  cost  for  platforms 

1 

IE-4 

1 

IE-4 

final  cost  for  platforms 

10 

IE- 3 

10 

IE-3 

distance  from  target  enemy  unit 

0 

5E-3 

0 

IE-3 

cost  on  velocity  (fuel) 

25 

25 

25 

25 

cost  on  firing  (weapons) 

25 

25 

25 

25 

as  the  iterations  progress,  are  shown  in  Figs.  15.3  and  15.4  respectively. 

It  is  seen  that  convergence  to  a  small  value  of  Su  is  not  achieved  in  50  iterations.  Increasing  the 
number  of  iterations  up  to  200,  or  changing  the  step  size  parameter  of  the  algorithm  did  not  improve  the 
convergence.  For  this  reason,  it  is  likely  that  the  results  we  present  below  may  not  correspond  to  the  Nash 
solution  for  this  problem,  although  they  appear  to  be  reasonable  given  the  mission  objectives.  In  fact, 
it  is  not  known  whether  a  Nash  solution  exists  for  this  scenario  and  the  associated  cost  function.  There 
are  many  different,  equally  acceptable,  choice  of  weights  corresponding  to  the  same  mission  statement. 
However,  we  do  not  know  a  systematic  method  of  determining  those  weights,  and  a  trial-and-error 
approach  proved  to  be  very  time  consuming  and  fruitless  for  both  Examples  1  and  2. 

The  trajectories  obtained  by  the  SLQM  algorithm  for  each  sortie  of  Example  1  are  depicted  in 
Figs.  15.5,  15.6  and  15.7. 

For  Example  2,  the  behavior  of  the  control  update  (<5u)  and  the  cost  function,  as  the  iterations 
progress,  are  shown  in  Figs.  15.8  and  15.9  respectively.  It  is  seen  that  convergence  to  a  small  value  of  5u 
is  not  achieved  in  50  iterations.  Similar  to  Example  1,  increasing  the  number  of  iterations  up  to  200,  or 
changing  the  step  size  parameter  of  the  algorithm  did  not  improve  the  convergence.  For  this  reason,  it 
is  likely  that  the  results  we  present  may  not  correspond  to  the  Nash  solution  for  this  problem,  although 
they  appear  to  be  reasonable  given  the  mission  objectives. 

The  trajectories  obtained  by  the  SLQM  algorithm  for  each  sortie  of  Example  2  are  depicted  in 
Figs.  15.10,  15.11  and  15.12. 

The  expected  platform  loss  and  probability  of  success  results  of  the  two  examples  are  summarized  in 
Tables  15.3  and  15.4.  The  Honeywell  results,  taken  from  [3],  are  based  on  the  “Bombers  First”  strategy 
for  the  Red  fighters  and  “max  loss  =  3”  assumption.  For  Washington  U.  tests,  the  same  strategy  for  the 
Red  fighters  is  enforced  by  assigning  the  Blue  bombers  as  the  sole  target  of  this  unit.  The  probability  of 
success  (the  probability  of  destroying  the  Red  ground  target)  is  calculated  from  the  expected  number  of 
remaining  platforms  in  the  Red  ground  unit  Rl. 

In  Example  1,  the  Honeywell  and  Washington  U.  results  are  comparable.  Since  the  modeling  assump¬ 
tions  are  similar,  the  sources  of  discrepancy  are  the  effect  of  location  of  the  units  in  the  theater,  and 
the  selection  of  weights  in  the  cost  function,  which  express  the  relative  importance  of  achieving  target 
destruction  versus  platform  loss,  fuel  and  weapon  consumption. 

In  Example  2,  the  initial  number  of  Red  fighters  is  increased  to  5  and  their  probability  of  kill  against 
the  Blue  bombers  is  increased  to  0.6  from  0.4.  Even  though  the  cost  function  is  kept  the  same  as  in 
Example  1,  the  Blue  bomber  unit  B1  is  more  concerned  about  its  own  safety  and  rather  reluctant  to  get 
closer  to  its  destination,  as  seen  in  Figs.  15.10-15.12.  Therefore,  the  probability  of  success  is  much  lower. 
The  Honeywell  results  for  Example  2  are  not  available  at  this  time. 
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iteration  number 


Figure  15.10:  Trajectories  of  units,  Example  2,  Sortie  1. 


Table  15.3:  Example  1:  Probability  of  Success  and  Remaining  Number  of  Platforms  After  Each  Sor  tie 


Washington  U. 

Honeywell 

P  {success} 

B1 

B2 

R1 

R2 

P  {success} 

B1 

B2 

R2 

Initial 

7.00 

3.00 

10.0 

3.00 

7.00 

3.00 

3.00 

Sortie  1 

0.14 

6.40 

3.00 

8.60 

3.00 

0.19 

4.80 

3.00 

0.10 

Sortie  2 

0.29 

5.50 

3.00 

7.10 

2.90 

0.34 

4.74 

3.00 

0.002 

Sortie  3 

0.33 

4.80 

3.00 

6.70 

0.50 

0.46 

4.74 

3.00 

0.00 

Table  15.4:  Example  2:  Probability  of  Success  and  Remaining  Number  of  Platforms  After  Each  Sortie 


Washington  U. 

Honeywell 

P{  success} 

B1 

B2 

R1 

R2 

P  {success} 

B1 

B2 

R2 

Initial 

7.00 

3.00 

10.0 

5.00 

7.00 

3.00 

5.00 

Sortie  1 

0.01 

5.40 

3.00 

9.90 

4.80 

Sortie  2 

0.02 

4.30 

3.00 

9.80 

4.70 

Sortie  3 

0.04 

3.60 

3.00 

9.60 

4.70 

15.5  Conclusions  and  Recommendations 

In  this  experiment,  we  have  compared  the  platform  loss  and  probability  of  success  values  between  Wash¬ 
ington  U.  and  Honeywell  results  on  two  example  missions,  each  consisting  of  three  sorties.  Despite  running 
our  SLQM  algorithm  for  50  iterations,  convergence  to  a  small  value  of  the  control  update  (which  would 
indicate  a  possible  Nash  solution)  was  not  achieved  in  either  example,  although  the  unit  trajectories  and 
platform  loss  numbers  were  reasonable,  given  the  mission  objectives. 

In  the  first  example,  our  results  were  similar  to  Honeywell’s.  The  second  example  resulted  in  drastically 
different  attrition  numbers,  due  to  an  increase  in  the  initial  number  of  Red  fighters  from  3  to  5  and  their 
probability  of  kill  by  50%.  This  indicates  the  sensitivity  of  our  algorithm,  and  possibly  the  sensitivity 
of  the  Nash  solution  concept,  to  initial  states.  On  the  other  hand,  our  solution  yields  optimal  (or  near- 
optimal)  routes  and  firing  intensity  values,  given  the  cost  function,  which  are  not  part  of  the  Honeywell 
model. 

The  Honeywell  approach  is  based  on  maximizing  the  probability  of  success,  while  our  approach  tries 
to  find  the  saddle-point  of  a  cost  function.  It  may  be  worthwhile  to  spend  some  effort  to  investigate  how 
the  selection  of  weights  in  the  cost  function  affects  the  probability  of  success.  Even  better  would  be  to 
devise  a  “cost  translator” ,  which  will  yield  good  weight  values  (for  which  a  Nash  solution  exists)  that  can 
reflect  the  trade-off  between  desired  probability  of  success  and  acceptable  platform  loss. 
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Chapter  16 


Experiment  16:  Controller 
Computational  Complexity: 
Correction 

16.1  Executive  Summary 

The  purpose  of  Experiment  16  is  to  correct  an  error  present  in  the  subprogram  that  evaluates  the  Jacobian 
of  the  model  MDCM.  This  error  would  have  affected  the  results  in  cases  in  which  multiple  units  are 
deployed  against  multiple  units  and  some  units  are  not  fired  upon.  This  error  affects  only  one  such  case 
in  the  Interim  Report  (experiments  1  through  12),  that  is  experiment  5.3.2.  Therefore,  a  corrected  version 
of  the  subprogram  for  computing  the  Jacobian  has  been  developed,  and  corrected  computational  results 
are  reported  in  this  chapter.  Even  with  this  change  we  can  draw  the  same  conclusions  as  in  Experiment 
5;  namely,  the  computational  time  is  a  quadratic  function  of  the  number  of  units. 


16.2  Introduction 

The  original  purpose  of  Experiment  5  in  Chapter  5  of  the  Interim  Report  and  of  this  Final  Report  was 
to  perform  a  number  of  experiments  to  test  the  following  hypothesis:  The  computational  complexity  of 
the  differential  game  technology  based  controller  increases  quadrat ically  as  a  function  of  the  number  of 
units  and  linearly  as  a  function  of  the  mission  duration. 

One  experiment  run  reported  in  Chapter  5  showed  that  the  units  not  being  fired  upon  did  not  move, 
contrary  to  intuitive  expectations.  After  a  careful  examination,  we  found  an  error  in  the  subprogram 
that  evaluates  the  Jacobian  of  the  MDCM  model  (Mission  Dynamics  Continuous-time  Model).  This 
error  would  have  affected  the  results  in  cases  in  which  multiple  units  are  deployed  against  multiple  units 
and  some  units  are  not  fired  upon.  There  is  only  one  such  case  in  the  Interim  Report  (Experiments  1 
through  12),  that  is  experiment  5.3.2,  in  which  two  units  out  of  six  remain  fixed  in  their  initial  positions. 
Therefore,  a  corrected  version  of  the  subprogram  for  computing  the  Jacobian  has  been  developed,  and 
corrected  computational  results  are  reported  in  this  chapter. 

In  the  original  experiments,  both  the  plant  and  controller  models  are  the  same,  given  by  MDCM.  In 
a  first  set  of  experiments  the  number  of  units  in  the  scenario  is  increased  while  the  mission  objectives 
and  duration  are  kept  constant.  In  a  second  set,  the  mission  duration  is  increased,  while  the  mission 
objectives  and  the  number  of  units  are  kept  constant.  The  computation  time  and  the  number  of  iterations 
required  for  the  computation  of  the  control  law  to  converge  were  recorded  in  both  cases. 
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16.3  Experiment  5.1:  The  Number  Of  Units  Is  Increased  While 
The  Mission  Duration  Is  Kept  Constant 

In  this  set  of  experiments,  the  mission  duration  is  kept  constant  at  20  minutes. 

Five  experiments  have  been  conducted  for  each  of  the  following  cases:  1  unit  vs.  1  unit,  2  vs.  2,  3  vs. 
3,  4  vs.  4  and  5  vs.  5.  In  these  5  experiments  for  each  n  vs.  n  case  (  1  <  n  <  5  ),  the  units  categories, 
initial  conditions,  target  locations  and  nominal  trajectories  as  well  as  the  weights  in  the  cost  function 
may  vary.  The  computational  time  and  the  number  of  iterations  are  recorded  for  each  experiment. 

16.3.1  One  vs.  One 

In  this  section,  the  results  reported  in  Chapter  5  were  correct  and  thus  there  was  no  need  to  redo  the 
experiments. 

16.3.2  Multi-units  Case 

An  example  for  multiple  units  case  was  reported  in  Chapter  5  for  3  units  vs.  3  units.  Here  the  corrected 
version  of  the  Jacobian  subprogram  yields  results  in  which  all  units  now  move  from  their  respective 
starting  positions. 

Table  16.1  summarizes  the  pertinent  information  for  the  two  opposing  forces  in  that  specific  example. 
The  manner  of  engagement  in  that  example  is:  R1  and  R2  are  programmed  to  attack  B1  and  R3  is 
programmed  to  attack  B3.  B1  and  B2  are  programmed  to  attack  R1  and  R2  respectively,  and  B3  is 
programmed  to  attack  R3.  The  choice  of  the  weights  is  slightly  different  from  those  of  the  Experiment 
5.1. 


Table  16.1:  Data  for  Three  vs.  Three 


B1 

B2 

B3 

R1 

R2 

R3 

Unit  categories 

bombers 

bombers 

ground 

bombers 

interceptors 

ground 

Initial  no.  of  platforms 

10 

10 

10 

10 

10 

10 

Initial  no.  of  weapons 

10 

10 

10 

10 

10 

10 

Initial  position 

(20,53) 

(20,50) 

(45,47) 

(80,53) 

(80,50) 

(55,47) 

Target  location 

(70,63) 

(80,52) 

(53,48) 

(30,63) 

(20,48) 

(43,46) 

Figures  16.1  -  16.2  show  respectively  the  initial  state  trajectories  and  the  convergence  of  the  control 
updates  ||Ju||.  Figures  16.3  -  16.5  present  the  Nash  solution;  specifically  Figure  16.3  presents  the  Nash 
solution  trajectories,  Figure  16.4  presents  the  corresponding  firing  intensities  and  Figure  16.5  presents 
the  history  of  the  number  of  platforms.  With  a  convergence  criterion  of  the  norm  ||£u||  of  the  control 
change  Su  less  than  0.01,  convergence  is  attained  after  34  iterations.  The  total  simulation  time  is  now 
601.13  sec. 

The  main  difference  from  the  previous  simulation  results  reported  in  Chapter  5  consists  of  the  move¬ 
ment  that  units  B3  and  R3  now  show,  in  agreement  with  the  intuitive  expectation  based  on  the  choice 
of  the  weights  for  the  cost  function.  In  general,  the  corrected  version  of  the  Jacobian  subprogram  makes 
a  difference  in  scenario  files  in  which  some  units  are  not  shot  at  by  an  enemy  unit. 

16.3.3  Multi-units  Case  And  Computational  Complexity 

Due  to  the  correction  made  for  the  Jacobian  computation,  the  computational  time  (601.15  sec.)  is  slightly 
longer  than  what  was  reported  in  Chapter  5  for  the  case  of  6  units.  However,  we  may  draw  the  same 
conclusion  as  before.  This  can  be  verified  by  analyzing  Figure  5.11  of  Chapter  5,  which  is  also  shown 
here  as  Figure  16.6.  When  the  number  of  units  is  six,  as  analyzed  here,  the  computational  time  is  601.13 
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Figure  16.1:  Initial  Trajectories  for  Three  vs.  Three 


number  erf  iterations 


Figure  16.2:  Convergence  for  Control  Updates  for  Three  vs.  Three 

sec.  Even  with  this  change  we  can  draw  the  same  conclusions  as  before.  Namely,  the  computational  time 
is  a  quadratic  function  of  the  number  of  units. 
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Figure  16.3:  Nash  Trajectories  for  Three  vs.  Three 
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Figure  16.4:  Nash  Firing  Intensities  for  Three  vs.  Three 
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Figure  16.5:  Nash  Number  Of  Platforms  for  Three  vs.  Three 
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Figure  16.6:  The  Computational  Time  Changes  As  The  Number  Of  Units  Is  Increased 
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16.4  Experiment  5.2:  The  Mission  Duration  Is  Increased  While 
The  Number  Of  Units  Is  Kept  Constant 

In  this  set  of  experiments,  previous  results  still  hold. 


16.5  Conclusions 

The  use  of  the  corrected  Jacobian  subprogram  does  not  change  substantially  the  main  conclusions  of 
Chapter  5.  They  are:  the  computational  time  required  to  reach  the  convergence  criterion  depends  on 
many  factors,  such  as  the  units  categories,  the  number  of  units,  initial  trajectories,  weights  in  the  cost 
function,  step  size  in  our  numerical  procedure  and  the  manner  of  engagements  as  well  as  initial  positions 
and  target  locations.  Similarly  the  number  of  iterations  required  to  reach  convergence  depends  on  the 
same  factors.  In  our  experimental  results,  major  factors  which  affect  the  computational  time  are  the 
number  of  units  and  the  mission  duration.  For  our  experiments  the  computational  time  of  the  controller 
increased  quadratically  as  a  function  of  the  number  of  units  and  linearly  as  a  function  of  the  mission 
duration,  while  the  number  of  iterations  itself  remained  relatively  constant  as  a  function  of  the  number 
of  units.  This  last  point  was  a  pleasant  surprise. 
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Chapter  17 


Experiment  17:  Controller  with  a 
Kalman  Filter  for  Estimation 

17.1  Executive  Summary 

In  this  chapter,  we  present  how  an  algorithm  based  on  the  Extended  Kalman  Filter  (EKF)  for  state 
estimation  is  used  in  a  differential  game,  which  models  the  air  operations  of  two  opposing  forces.  We 
show  the  overall  structure  of  the  game  in  a  block  diagram.  We  present  the  implementation  of  the  algorithm 
in  a  flowchart.  We  also  present  simulation  results. 

In  an  air  operation  game,  it  is  reasonable  to  assume  that  one  does  not  get  direct  information  about 
his  enemy’s  input.  In  this  paper,  we  present  an  approach  for  estimating  the  states  of  the  friendly  as  well 
as  enemy  forces  and  compare  their  respective  simulation  results.  The  Kalman  filter  due  to  Darouach  et 
al.  treats  the  enemy  inputs  as  part  of  the  extended  state  and  obtains  an  estimate  of  both  the  state  of 
the  two  forces  and  the  input  of  the  enemy.  But  their  filter  is  designed  for  linear  time- invariant  systems. 
Hence,  we  present  an  extension  of  their  filter  to  a  nonlinear  time- variant  system. 

The  extended  Kalman  filter  algorithm  presented  in  this  report  is  capable  of  estimating  the  states  of 
both  forces  in  the  presence  of  process  noise  as  well  as  sensor  noise.  We  note  that  the  estimates  of  the 
enemy  inputs  are  too  noisy  to  be  directly  useful.  However,  our  game- theoretic  controller  requires  only  an 
estimate  of  the  enemy  state  and  it  does  not  require  any  estimates  of  the  enemy  input.  We  thus  observed 
the  game-theoretic  controller  remained  effective  when  the  extended  Kalman  filter  is  introduced  in  the 
loop. 


17.2  Introduction 


The  purpose  of  the  experiment  is  to  show  that  the  current  differential  game  technology,  combined  with 
an  extended  Kalman  filter,  provides  an  effective  means  of  countering  the  enemy  actions  under  ideal¬ 
ized  situations  with  perfect  information  about  enemy  initial  conditions  and  objectives,  but  with  noisy 
measurements  of  a  subset  of  the  enemy  state. 

Description:  Both  the  plant  and  internal  models  are  the  same,  i.e.,  the  MDCM  (Mission  Dynamics 
Continuous  Model).  Increasing  levels  of  noise  will  be  added  to  the  state  variables  when  constructing  the 
observed  state  variables  (the  output  variables).  Some  of  the  enemy  state  variables  (weapons  per  platform 
first,  and  number  of  adversary  platforms  next)  will  be  removed  from  the  set  of  output  variables  thus 
making  them  not  directly  observable.  The  control  actions  of  the  Blue  and  Red  teams  are  generated  by 
the  proposed  game  theoretic  algorithm. 
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The  current  differential  game  technology,  combined  with  an  extended  Kalman  filter  (EKF)  provides 
an  effective  means  of  countering  the  enemy  actions  under  idealized  situations  with  perfect  information 
about  enemy  initial  conditions  and  objectives,  but  with  noisy  measurements  of  a  subset  of  the  enemy 
state.  The  algorithm  based  on  EKF  adequately  estimates  the  unknown  red  state  in  the  presence  of 
process  and  observation  noise. 

Consider  a  dynamical  system  governed  by  the  following  equation, 

=  f{x(t),  u (t),  t)  +  w(t),  t  e  [to,  */l;  x{to)  =  Zo,  (17.1) 

at 

and  a  observation  process  given  by 

y(t)  =  h(x(t),  u(t ),  t)  +  v(t),  t  6  [£o,*/]>  (17-2) 

where  the  control  u  is  an  Rm-  valued  function  on  [to,tf],  f(x ,u,  t)  is  an  Revalued  continuously  dif¬ 
ferentiable  function  on  Rn  x  Rm  x  R,  h(x,  u,  t)  is  an  Rp- valued  continuously  differentiable  function  on 
Rn  x  Rm  x  R,  the  initial  state  z0  is  a  Gaussian  random  variable,  and  the  process  noise  w(t)  and  the  mea¬ 
surement  (sensor)  noise  v(t)  are  Gaussian  white  noise  processes.  We  assume  that  these  random  variables 
and  random  processes  are  mutually  independent.  For  any  fixed  initial  state  x(to)  —  Zo  and  any  admissible 
control  u  of  some  restricted  class  [/,  we  assume  that,  equation  (17.1)  has  a  unique  solution  x.  Such  a 
solution  x  is  called  the  trajectory  of  the  system  produced  by  control  u  and  denoted  by  x[u\.  Rigorous 
definition  of  (17.1)  by  Ito’s  stochastic  integral  and  the  theory  of  stochastic  differential  equations  can  be 
found  in  [1],  [3]  and  [4]. 

Our  dynamical  system  (17.1)-(17.2)  models  a  game  played  by  opposing  military  forces  in  battle 
through  their  air  operations.  The  control  function  u  consists  of  two  parts,  uB  and  uR ,  correspond¬ 
ing  to  the  two  forces,  the  Blue  and  the  Red  forces:  u  =  (uB,uR).  In  actual  theaters  for  military  air 
operations,  information  is  sometimes  not  available  and  is  corrupted  by  measurement  errors  and  mislead¬ 
ing  signals  from  the  enemy,  even  when  available.  These  corruptions  are  modeled  as  white  noise  error 
processes  in  (17.1)-(17.2).  We  investigate  the  problem  of  estimating  the  state  of  the  enemy  from  noisy 
signals  about  the  enemy  location  without  knowing  the  numbers  of  the  enemy  platforms  (a  part  of  the 
enemy  state)  and  the  control  inputs  uR  of  the  enemy.  We  then  propose  an  extended  Kalman  filter  for 
the  problem  and  evaluate  its  effectiveness  in  the  closed-loop  of  a  game-theoretic  controller. 

In  this  report,  for  practical  purposes  and  the  flexibility  of  analysis,  we  replace  the  continuous-time 
model  (17.1)- (17. 2)  by  the  following  discrete- time  dynamics  and  observation  process: 

Xk+i  =  fk(xk,Uk)  +  Wki  k  >  0;  x0  =  z0>  (17.3) 

Vk  -  hk(xk,uk)  +  vk,  k  >  0,  (17.4) 

where  fk(x,u)  is  an  Rn- valued  continuously  differentiable  function  on  Rn  x  Rm,  hk{x,u)  is  an  Revalued 
continuously  differentiable  function  on  Rn  x  Rm,  the  initial  state  z0  is  a  Gaussian  random  variable, 
and  the  process  noise  wk  and  the  measurement  noise  vk  are  zero- mean  Gaussian  white  noise  sequences 
uncorrelated  with  each  other  and  with  the  initial  state  z0  of  the  system.  Their  respective  covariance 
matrices  are  given  as 

E(ww')  =  W,E(vv')  =  V ,  where  W  >  0,  and  V  >  0  are  diagonal.  (17.5) 

Here,  we  also  assume  that  each  control  input  uk  consists  of  two  parts,  uf  and  ufc  corresponding  to  the 
two  forces,  the  Blue  and  the  Red  forces:  uk  =  (uf*,u§).  In  this  report,  we  take  the  point  of  view  of  the 

Blue  force  and  construct  a  Kalman  filter  for  the  Blue  force.  We  suppose  that  the  enemy  inputs  (i.e.,  u§) 

are  unknown.  Thus,  we  need  filters  which  do  not  make  use  of  the  enemy  inputs  u j?  . 


294 


Section  17.3  is  devoted  to  a  Kalman  filter  technique  for  estimation  in  systems  with  unknown  inputs. 
We  present  a  Kalman  filter  for  estimating  the  enemy  state  and  the  enemy  inputs  for  a  linear  system.  This 
filter  was  adapted  from  the  result  by  Darouach  et  al.  [2].  We  also  propose  an  extended  Kalman  filter 
for  a  nonlinear  system,  and  then,  we  apply  the  extended  Kalman  filter  to  a  nonlinear,  continuous-time, 
stochastic  air  operation  game.  We  present  the  differential  game  in  the  battlefield.  The  experimental 
scope  and  setup  is  given  in  Section  17.4,  and  the  simulation  results  and  analysis  are  given  in  Section  17.5. 
The  chapter  concludes  with  some  conclusions. 


17.3  Kalman  filters  for  systems  with  unknown  inputs 

Linear  Kalman  filter  for  estimating  states  and  enemy  inputs:  We  consider  the  following  linear 
model: 

xk+i  -  Akxk  +  BkUk  +Bj*Uk  +wk,  k>  0;  x0  =  z0,  (17-6) 

and 

yk  =  Hkxk  +  vk,  k>  0.  (17.7) 

Here,  for  time  k,  xk  e  denotes  the  state  vector,  €  Rmi  denotes  the  known  Blue  control  input, 
u§  G  M™2  denotes  the  unknown  Red  control  input  and  yk  €  Rp  denotes  the  output  vector.  The  noise 
processes  wk  and  vk  were  as  given  in  Section  1  and  the  matrices  Ak,B%  ,B§  and  Hk  have  appropriate 
dimensions. 

We  consider  the  problem  of  recursive  estimation  for  linear  system  (17.6)-(17.7)  by  estimating  the  state 
vector  xk  and  the  unknown  input  from  Kalman  filtering  without  knowing  the  value  of  the  enemy  input 
u§ .  One  possible  solution  is  to  construct  a  Kalman  filter  by  defining  a  new  state  vector  Xk  =  (x'fc,  )'. 
This  approach  was  taken  and  worked  out  for  time-invariant  case  by  Darouach  et  al.  [2]  under  the  time 
invariant  version  of  the  following  assumption: 

Rank  Condition  1 

rank(iffc)  =  p,  rank  (#£)  =  m2, 
m 2  <  p,  and  v&nk(HkB§)  =  m 2,  for  all  k. 

Let  xk/i  denote  the  estimate  of  re*  based  on  the  measurements  y  up  to  and  including  time  instant  1. 
Other  estimates  are  defined  similarly.  We  extend  the  Kalman  filter  in  Theorem  3  in  [2],  which  is  for  a 
time- invariant  system,  to  the  following  time- variant  version. 

Proposition  5.  Assume  that  Rank  Condition  1  is  satisfied  for  the  linear  system  (17.6)-(17.7).  Then  the 
optimal  estimates  for  the  entire  state  and  the  unknown  Red  inputs  are  obtained  by 

xk/k  :=  Ak  xfc/fc  +  Bk  uk,  (17.8) 

~~  r>R  a  R 

T  Bk  'U'kfk_ j_i 

+Kk+ 1  {t/fc+i  “  (xk/k  +  Bk  Wfc/jk+1j| ,  (17.9) 

^k/fc+i  ~  (2/fc+i  “  Bfk+i  %k/k)  »  (17.10) 

where  the  Kalman  gain  matrices,  for  the  state  estimate  and  Kk^  for  the  unknown  input  estimate, 

are  obtained  respectively  by 

Kk+I  =  (Pk/Zl  +Hk+1'V~1Hk+iyl  Hk+1’V-\ 

where 

Pk/k=AkP^/kA'k  +  W  (17.13) 


(17.11) 

(17.12) 
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and  the  estimation  error  covariance  matrix 


Bk/k  =  E  [Xk/k  -  Xk/k'j  (Xk/k  ~ 


with  Xk  =  (x'k,uj*-i  Y  and  Xk  =  (£'fc/fc>  (^_i/fc)/)/  is  partitioned  as 


kjk 


px  pxu 

rkfk  rk/k 

puRx  puR 
rk/k  rk~l/k 


and  each  block  matrix  has  the  following  form: 

Puk+1  =  ! Bp'Hk+1'  ( V  +  Hk+1Pk/kHk+ 1')  _1  Hk+1B? 


pxu 


Pk+\/k+yPk/k'Bk  {pk'Pk/klBk  ) 


pu  X 

J  fc+l/fc+1 


f*k/k+  l^kfPk/kl  (^fc/fc1  ^  1Hk+ 1)  5 

s 

pk+ 1/fc+i  =  (  ^fc/ifc1  +  Hk+i'V~lHk+i  -  Pk/k  1  Bk  ^Bk'Pk/k1Bk  )  Bjypk/kl 


(17.14) 


(17.15) 

(17.16) 

(17.17) 

(17.18) 


Extended  Kalman  filter  for  estimating  states  and  enemy  inputs:  We  assume  in  this  section 
that  the  function  fk  defined  in  (17.3)  is  nonlinear  but  the  function  hk  defined  in  (17.4)  is  linear,  i.e., 


yk  =  Hkxk  +  vk,  k>  0. 


(17.19) 


Since  fk{x,u)  is  a  continuously  differentiable  function  of  (x,  u),  we  apply  a  linear  approximation  to  the 
right  hand  side  of  (17.3)  around  (xk/k,uk),  where  uk  =  (uk,uk/k+1).  This  is  a  good  approximation  so 
long  as  |M|  is  small.  Then,  by  applying  the  Kalman  filter  described  in  Proposition  5  to  this  linear  (or 
affine)  system  and  also  replacing  the  state  xk  by  its  estimate  xk/k  and  the  unknown  input  by  its 
estimate  ukjk+l,  we  obtain  the  following  extended  Kalman  filter: 


xk/k  '■=  Akik/k  +  Bk  uk,  (17.20) 

a  —  r>R  *  Ft 

£fc+l/fc+l  =  xk/k  +  &k  uk/k+ 1 

+^fc+i  |?/fc+i —  Hk+i  {^k/k  +  Bk  (17.21) 

fijf/fc+1  =  (Vfc+1  ~  xk/k) ,  (17-22) 

where 

Ak^^(xk/k,uk),  (17.23) 

B%=  ^(£k/k,uk),  (17-24) 

B?  =  ^(xk/k,uk),  (17.25) 

(17.26) 
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where  the  Kalman  gain  matrices  and  the  estimation  error  covariance  matrix  has  the  same  form  as  in 
Proposition  5. 

Extended  Kalman  filter  algorithm:The  block  diagram  of  the  extended  Kalman  filter  for  estimat¬ 
ing  state  and  enemy  input,  is  shown  in  Fig.  17.1.  This  diagram  also  shows  the  continuous-time  plant 
and  how  the  filter  is  connected  to  the  plant. 

The  inputs  to  the  filter  are  the  sampled  output  vector  yk  of  the  plant  and  the  sampled  input  vector 
for  the  friendly  Blue  unit.  The  outputs  of  the  filter  are  respective  estimates,  xk/k  and  u^fc+1,  of  the 
state  vector  xk  and  the  enemy  input  vector  u The  flowchart  of  the  algorithm  is  given  in  Fig.  17.2.  As 
the  original  filter  due  to  Darouach  et  al.  was  devised  for  linear,  time-invariant  discrete-time  plants,  we 
employ  samplers  and  an  extension  of  their  Kalman  filter  to  a  nonlinear  time- varying  system.  We  thus 
linearized  our  model  around  an  estimated  nominal  trajectory  and  discretized  it.  We  then  applied  the 
Kalman  filter  algorithm  to  the  linearized  discretized  model. 
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Extended  Kalman  filter  for  estimating  state  and  enemy  input  U ^ 


Figure  17.1:  Block  diagram  of  the  extended  Kalman  filter 
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Compute:  State  Estimation 

f  ^  ^  i  n  R  /:  R  x.  v  x 

h  uk  +  Bk  Uk!k+  i  +  +  ' 
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Differential  game:  The  overall  game  is  expressed  as  the  following  minmax  problem: 

J*  =  minmax  J(uB,uR),  (17.27) 

uB  uR 

where  the  Red  force  tries  to  maximize  the  payoff  function  J(uB,uR)  and  the  Blue  force  tries  to  minimize 
the  same  payoff  function.  Here  the  problem  is  defined  over  the  time  interval  [£0,£/]  between  the  initial 
time  t0  and  the  terminal  time  tj.  The  optimal  value  J *  of  the  payoff  function  J(uB  ,uR)  is  called  the 
value  of  the  game. 

We  assume  that  each  unit  is  homogeneous.  By  this  we  mean  that  each  bomber  unit  consists  of 
bombers  of  the  same  type,  each  ground  troop  unit  consists  of  ground  troops  of  the  same  type,  and  so 
on.  In  other  words,  each  unit  consists  of  platforms  of  the  same  type.  The  platforms  we  may  consider 
are  bombers,  SAM  missile  launchers,  electronic  jammers,  weasels,  fighter- interceptors,  personnel  carriers 
and  tanks. 

We  report  the  simple  cases  of  our  differential  games  in  this  report:  The  Blue  and  Red  forces  have 
one,  two  and  four  units  each  depending  on  the  scenario  type.  Both  units  start  with  10  platforms.  For 
example, in  the  scenario  crossll,  the  Blue  unit  (Bl)  starts  with  10  interceptors  and  the  Red  unit  (Rl) 
starts  with  10  bombers. 

The  friendly  (Blue)  unit  control  is  based  on  the  optimum  linear  feedback  of  the  estimated  state  around 
the  Nash  solution.  The  adversary  (Red)  unit  control  input  may  be  given  manually  by  a  human  operator. 
The  goal  in  the  game  is  as  follows. 

The  Blue  interceptors  try  to  destroy  as  many  Red  bombers  as  possible  and  to  reach  their  own  respective 
destinations,  and  the  Red  bombers  try  to  preserve  their  own  platforms  and  to  reach  their  own  respective 
destinations. 

If  the  estimated  state  vector  deviates  from  the  current  Nash  equilibrium  solution,  the  iteration  stops 
and  a  new  Nash  equilibrium  solution  is  recalculated  over  the  remaining  time  period. 

The  numerical  method  for  finding  the  Nash  equilibrium  solution  is  an  iterative  process  in  which  a 
linear-quadratic  approximation  of  the  original  game  is  successively  solved  using  the  Riccati  equation 
approach  [7]. 

17.4  Experiment  scope  and  setup 

The  experimental  setup  for  the  game  theoretic  controller  with  the  Kalman  filter  is  given  in  Fig.  17.3.  The 
algorithm  has  been  tested  for  the  military  air  operation  model,  which  is  nonlinear  and  continuous-time. 
The  dynamic  model  of  air  operations  for  the  military,  and  the  formulation  of  the  problem  of  controlling 
its  missions  as  a  differential  game  are  presented  in  [6].  As  the  original  Kalman  filter  [2]  was  devised  for 
linear  discrete-time  plants,  we  introduce  samplers  and  its  extension  to  a  nonlinear  system.  Our  model  is 
thus  linearized  around  a  nominal  trajectory  and  discretized,  and  then  the  algorithm  is  run.  We  consider 
the  simplest  case  in  this  paper:  The  Blue  and  Red  forces  have  one  unit  each.  Both  units  start  with  10 
platforms.  In  fact,  the  Blue  unit  (Bl)  starts  with  10  interceptors  and  the  Red  unit  (Rl)  starts  with  10 
bombers. 

In  all  the  graphs  to  follow,  the  solid  line  represents  the  actual  values  and  the  dotted  line  represents 
the  estimated  values. 

In  our  simulations,  the  extended  Kalman  filter  takes  as  inputs  the  noisy  observations  of  the  position 
of  the  Blue  force,  (£f ,  ),  the  number  of  Blue  platforms,  r)B ,  the  position  of  the  Red  force,  (£(*,  £R)  and 

Blue  input  irB ,  and  it  yields  as  output  the  estimates  for  the  same  variables  and,  in  addition,  the  number 
of  Red  platforms,  rfR.  We  note  that  the  filter  gets  as  input  neither  the  number  of  Red  platforms,  t]r  nor 
the  enemy  input,  uR. 

The  control  inputs  for  the  Blue  and  Red  units  are  the  respective  velocities  (fif  ,{iB)  and  (fiR^R)^ 
and  the  respective  firing  intensities  (rrBynR). 

If  the  Blue  and  Red  units  are  not  engaged  with  each  other,  there  is  no  need  to  estimate  the  Red  firing 
intensity  7rH,  so  it  is  estimated  only  when  the  engagement  occurs.  This  was  implemented  as  follows.  While 
the  distance  between  the  Blue  and  Red  forces  is  large,  the  sensitivity  of  the  number  of  Blue  platforms 
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rjB  to  the  firing  intensity  nR  of  the  Red  force  is  weak  and  the  observability  rank  condition  is  almost 
unsatisfied  making  the  Kalman  filter  ineffective.  Hence,  while  the  distance  between  the  Blue  and  Red 
forces  is  large,  we  do  not  estimate  the  firing  intensity  ttr  of  the  Red  force,  but  we  estimate  only  the 
velocity  control  (fjR,f.iR)  and  the  entire  state  x.  In  this  case,  we  assume  that  the  Red  force  would  not 
fire:  ttr  =  0. 

We  tested  the  following  three  levels  of  noise:  the  high  level  noise:  1%  of  the  operating  value  for  the 
blue  states,  and  5%  of  the  operating  value  for  the  red  states;  the  medium  size  noise:  0.5%  of  the  operating 
value  for  the  blue  states,  and  2.5%  of  the  operating  value  for  the  red  states;  the  low  level  noise:  0.25%  of 
the  operating  value  for  the  blue  states,  and  1.2%  of  the  operating  value  for  the  red  states. 


Game-Theoretic  Controller  with  Kalman  Filter 


uR 

Battlefield 

Information 

Surveillance 

Reconnaissance 

Control 

State  jc 

uB 

- 

Measurements 


Figure  17.3:  The  closed- loop  game  theoretic  controller  combined  with  the  Kalman  filter. 
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We  did  experiments  for  two  kinds  of  scenarios:  crossll  and  cross23. 

We  conducted  simulations  for  crossll  for  the  following  3  cases: 

(1)  Observation  noise  only,  i.e.,  no  process  noise; 

(2)  Process  noise  only,  i.e.,  no  observation  noise; 

(3)  Both  process  and  observation  noise. 

For  each  case,  we  tested  3  levels  of  strength:  low,  medium,  and  high.  In  this  report,  only  the  results 
for  the  high  level  noise  for  the  above  three  cases  are  presented  for  crossll  scenario. 

On  the  other  hand,  we  conducted  simulations  for  cross23  scenario  for  only  low  sensor  noise  case. 
Since  the  number  of  platforms  for  the  Red  force,  r/R,  is  not  observed,  a  small  error  in  the  estimation 
may  occur.  The  estimated  enemy  inputs  (velocities  and  firing  intensity)  are  used  only  for  the  state 
estimation  but  not  for  feedback  control.  Therefore,  the  fluctuations  in  the  input  estimations  do  not  cause 
much  error  in  the  overall  performance  of  the  controller. 

In  the  following  graphs  for  our  simulations,  the  solid  line  represents  the  actual  (exact)  value  and  the 
dotted  line  represents  the  estimated  value. 


17.5  Experiment  Results  and  Analysis 

I-  Simulation  results  for  the  scenario  crossll 

In  this  scenario,  the  Blue  force  has  a  bomber  and  the  Red  force  has  an  interceptor.  For  this  game, 
the  state  variables  and  the  control  inputs  vary  as  in  Fig.  17.4-17.8.  The  observed  trajectories  and  the 
observed  numbers  of  platforms  axe  corrupted  by  a  sensor  noise.  The  scenario  is  illustrated  by  these 
figures.  Fig.  17.4  shows  the  trajectories  of  the  Blue  and  Red  forces.  The  Blue  forces  move  from  west  to 
east  and  red  forces  move  from  north  to  south.  During  the  mission  an  engagement  occurs.  Due  to  the 
engagement,  both  forces  lose  some  number  of  platforms  as  seen  in  Fig.  17.5.  Fig.  17.6  and  Fig.  17.8 
illustrate  the  weapons  used  in  the  mission  and  the  firing  intensities,  respectively.  Firing  takes  place  when 
both  forces  meet  each  other.  The  speed  controls  are  depicted  in  Fig.  17.7. 
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B1  :interceptor,R1  :bomber 
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Figure  17.4:  Observed  Trajectories  of  Units 
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Figure  17.5:  Observed  Numbers  of  Platforms 
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Figure  17.6:  Weapons  per  platform 
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Figure  17.7:  Speed  Controls 


B1:blu,  R1:red 


Figure  17.8:  Fire  Intensities 


For  high,  medium,  low  sensor  noise,  Blue  state,  Red  state  and  Red  inputs  are  presented  in  Figs.  17.9- 
17.17.  The  simulation  results  show  that  the  error  in  the  Kalman  filter  estimate  is  acceptable,  though  the 
deviation  in  the  estimates  of  the  Blue  number  of  platforms  is  considerable. 

The  first  two  graphs  in  each  Blue  and  Red  state  figures  are  the  exact,  observed  and  estimated  values 
of  the  x  and  y  positions  of  the  Blue  and  Red  forces.  The  third  graph  is  the  exact,  observed  and  estimated 
values  of  the  number  of  platforms.  Note  that,  we  do  not  plot  the  observed  number  of  red  platforms, 
because  it  is  not  observed. 

The  first  two  graphs  in  each  red  inputs  figures  are  the  exact  and  estimated  values  of  the  x  and  y 
speeds  of  the  Red  force.  The  third  graph  is  the  exact  and  estimated  values  of  the  Red  firing  intensity. 
Although  the  estimates  of  the  speeds  and  firing  intensity  are  fluctuating  highly,  their  effects  on  state 
estimation  is  small. 
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Case  (1)  Observation  noise  only,  no  process  noise 

Process  noise:  w  =  0 

Covariance  for  the  sensor  noise:  V  =  diag[( 0.5)2,  (0.5)2,  (0.08)2,  (2.5)2,  (2.5)2] 

Sensor  noise  is  added  to  the  state  vector  [£f ,  ,  rjB , 

The  Blue  states,  Red  states  and  enemy  inputs  (actual  and  estimated)  are  presented  in  Fig.  17.9,  Fig. 
17.10,  and  Fig.  17.11,  respectively. 
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Figure  17.11:  Red  inputs  for  high  sensor  noise 


Case  (2)  Process  noise  only,  no  observation  noise 

Covariance  for  the  process  noise:  W  =  diag\{ 0.5)2,  (0.5)2,  (0.08)2,  (2.5)2,  (2.5)2] 

Process  noise  is  added  to  the  state  vector  [£f  ,  r)B  >  ,  £2} 

Sensor  noise:  v  =  0 

The  Blue  states,  Red  states  and  enemy  inputs  (actual  and  estimated)  are  presented  in  Fig.  17.12, 
Fig.  17.13,  and  Fig.  17.14,  respectively. 
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Figure  17.14:  Red  inputs  for  high  process  noise 


Case  (3)  Both  observation  and  process  have  noise 

Covariance  for  the  process  noise:  W  =  diag[( 0.5)2,  (0.5)2,  (0.08)2,  (2.5)2,  (2.5)2] 

Process  noise  is  added  to  the  state  vector  [£f  , 

Covariance  for  the  sensor  noise:F  =  dia#[(0.5)2(0.5)2(0.08)2(2,5)2(2.5)2] 

Sensor  noise  is  added  to  the  state  vector  [£f ,  t]B , 

The  Blue  states,  Red  states  and  enemy  inputs  (actual  and  estimated) are  presented  in  Fig.  17.15,  Fig. 
17.16,  and  Fig.  17.17,  respectively. 
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Figure  17.17:  Red  inputs  for  high  process  and  sensor  noise 


II-  Simulation  results  for  the  scenario  cross23 

In  this  scenario,  the  Blue  force  has  an  interceptor  and  a  bomber,  and  the  Red  force  has  two  interceptors 
and  a  ground  troop.  Blue  interceptor  and  bomber  head  to  the  Red  ground  troop. 

For  this  game,  the  state  variables  and  the  control  inputs  vary  as  in  Fig.  17.18-17.22  (Note  that  instead 
of  the  observed  results,  the  exact  results  are  presented  here).  The  scenario  is  illustrated  by  these  figures. 
Fig.  17.18  shows  the  trajectories  of  the  Blue  and  Red  forces.  The  Blue  bomber  and  Blue  interceptor 
move  from  west  to  east  towards  the  Red  ground  troop.  One  Red  interceptor  moves  from  north  to  south 
to  its  destination  and  the  other  Red  interceptor  moves  from  south  to  north  and  also  defences  the  ground 
troop  .  During  the  mission  an  engagement  occurs.  Due  to  the  engagement,  both  forces  lose  some  number 
of  platforms  as  seen  in  Fig.  17.19.  Fig.  17.20  and  Fig.  17.22  illustrate  the  weapons  used  in  the  mission 
and  the  firing  intensities,  respectively.  Firing  takes  place  when  both  forces  meet  each  other.  The  speed 
controls  are  depicted  in  Fig.  17.21. 
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Figure  17.18:  Trajectories  of  Units 
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Figure  17.19:  Number  of  Platforms 
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Figure  17.20:  Weapons  per  platform 
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1.2 


B1:biu,  B2:cyn,  R1:red,  R2:mag,  R3:grn 


Figure  17.22:  Fire  Intensities 


Low  level  observation  noise  only,  no  process  noise 

The  Blue  states,  Red  states  and  enemy  inputs  (exact,  observed,  and  estimated)are  presented  in  Fig. 
17.23-17.30,  respectively. 
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Figure  17.25:  Red  states  for  high  sensor  noise  for  unitl 


time  (min.) 


time  (min.) 


time  (min.) 


Figure  17.26:  Red  states  for  high  sensor  noise  for  unit 2 
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*  and  for  unit3 
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Figure  17.29:  Red  inputs  for  high  sensor  noise  for  unit 2 
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Figure  17.30:  Red  inputs  for  high  sensor  noise  for  unit3 


17.6  Conclusions 

In  this  report,  we  have  presented  how  the  extended  Kalman  filter  algorithm  for  state  estimation  is  used 
in  a  differential  game,  which  models  the  air  operations  of  two  opposing  forces.  The  air  operation  model 
is  nonlinear  and  time- varying.  As  the  filter  due  to  Darauch  et  al.  is  designed  for  linear  time-invariant 
systems,  we  have  developed  an  extension  of  their  filter  to  a  nonlinear  time-variant  continuous  system. 

The  extended  Kalman  filter  presented  in  this  report  is  capable  of  estimating  the  states  in  the  presence 
of  process  noise  as  well  as  sensor  noise  in  different  magnitudes. 

We  have  observed  that  the  game-theoretic  controller  remained  effective  when  the  extended  Kalman 
filter  is  introduced  in  the  feedback  loop.  The  closed  loop  filter  performance  is  verified  by  the  simulations 
for  different  scenarios. 
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Chapter  18 


Experiment  18:  Method  of 
Characteristics:  Addendum 

18.1  Executive  Summary 

The  purpose  of  Experiment  11  was  to  verify  that  the  solution  computed  by  the  Sequential  Linear- 
Quadratic  Method  (SLQM)  was  the  same  as  the  Nash  solution  computed  by  the  Method  of  Charac¬ 
teristics.  We  verified  that  the  solutions  computed  by  the  Sequential  Linear-Quadratic  Method  (SLQM) 
were  indeed  the  same  as  the  Nash  solutions  computed  by  the  Method  of  Characteristics  under  several 
scenarios.  However,  the  experiments  in  Chapter  11  all  involved  one  Blue  unit  against  one  Red  unit.  In 
Experiment  18,  we  extend  the  results  in  Experiment  11  to  a  scenario  of  multi-units  against  multi-units. 
Specifically,  Experiment  18  tests  the  Method  of  Characteristics  for  the  case  of  three  blue  units  against 
three  red  units. 


18.2  Purpose  of  the  Experiment 

The  purpose  of  Experiment  18  is  to  test  whether  the  Method  of  Characteristics  works  for  multi-units 
against  multi-units. 


18.3  Hypothesis 

The  method  is  successful  in  the  case  of  multi-units  against  multi-units. 


18.4  Methods 

The  algorithm  is  described  in  paper  [1]. 


18.5  Experiment  Scope 

In  this  scenario,  three  blue  units  (Bl:  bomber,  B2:  interceptor,  B3:  interceptor)  encounter  three  red 
units  (Rl:  bomber,  R2:  interceptor,  R3:  bomber).  The  dynamics  for  the  characteristics  equations 
have  been  computed  but  are  not  given  here  explicitly.  The  parameter  values  are  aBl  =  aRl  =  0.05, 
bBl  =  bRt  —  0.0,  and  pBl  =  pRt  =  0.0.  When  control  penalties  are  used,  the  parameter  values  are 
RBi  =  RB i  =  RBi  =  R**  =  150,  and  RBi  =  RRi  =  2.5,  for  i-1,2,3. 
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18.6  Experiment  Results 

The  results  of  experiment  are  shown  in  the  following  figures.  Figure  18.1  shows  the  Nash  trajectory  for 
the  Blue  and  Red  units.  Figure  18.2  shows  the  Nash  engagement  intensities  as  well  as  the  Nash  number 
of  platforms  for  the  both  forces. 


Backward  Integrartion 


Figure  18.1:  Nash  Trajectories  for  Three  Units  vs.  Three  Units 


18.7  Analysis 

The  results  obtained  by  the  Method  of  Characteristics  for  multi-units  against  multi-units  match  very 
closely  those  obtained  by  the  Sequential  Linear- Quadratic  Method. 


18.8  Conclusion 

The  Method  of  Characteristics  is  a  feasible  procedure  for  verifying  the  results  of  the  Sequential  Linear- 
Quadratic  Method. 


324 


E 


0 


E 


Red  Platforms,  tir  ^'r'n9  intensity. 


E 


0 


E 


326 


Bibliography 


[1]  I.  N.  Katz,  H.  Mukai,  H,  Schattler,  and  Mingjun  Zhang,  Solution  of  a  differential  game  formulation 
of  military  air  operations  by  the  m,ethod  of  characteristics ,  Proc.  of  the  2001  American  Control 
Conference,  Washington,  D.C.,  June  2001. 


327 


I 

i 


328 


Chapter  19 


Experiment  19:  New  Game  Flow 
Models 

19.1  Executive  Summary 

Military  operations  can  be  viewed  as  a  hierarchical  structure  in  which  actions  are  taken  by  individual 
units  at  a  low  level,  based  on  strategies  developed  by  planners  at  a  high  level.  In  this  experiment  we 
consider  the  situation  in  which  two  forces,  say  the  blue  and  red  forces,  control  a  large  number  of  units 
distributed  over  a  large  geographical  area.  We  develop  a  tool  that  is  useful  to  high-level  planners  in 
simulating  and  computing  the  optimal  strategy  for  the  two  forces.  We  also  report  the  results  of  our 
numerical  experiments. 

The  geographical  area  in  our  model  is  represented  by  an  abstract  game  board  that  is  divided  into 
cells  so  that  the  strength  concentration  of  the  blue  (resp.  red)  force  in  a  cell  is  defined  as  the  number  of 
blue  (resp.  red)  units  contained  in  the  cell  divided  by  the  area  of  the  cell.  The  game  is  concurrent  in  the 
sense  that  both  the  blue  and  red  forces  can  move  some  or  all  of  their  respective  units  simultaneously  and 
continuously  during  the  game. 

We  formulated  the  military  operation  control  problem  as  a  differential  game  over  the  abstract  game 
board.  The  differential  game  consists  of  a  quadratic  payoff  function  and  a  set  of  ordinary  differential 
equations  describing  the  system  dynamics  of  the  unit  distribution  over  the  discritized  geographical  area 
(the  abstract  game  board). 

In  order  to  solve  such  a  geographically  distributed  differential  game,  we  developed  a  computer  method 
for  finding  a  local  Nash  solution  to  the  adversarial  game.  The  optimum  strategy  for  each  team  is  found 
using  the  iterative  algorithm  called  Sequential  Linear-Quadratic  Method.  Experimental  results  are  also 
presented  that  demonstrate  the  validity  of  this  concept. 


19.2  Introduction 

Military  operations,  and  many  types  of  games,  can  be  characterized  by  a  hierarchical  structure  in  which 
actions  are  performed  by  individual  units  at  the  low  level,  based  on  strategies  developed  by  planners  at 
the  high  level  [1,  2].  The  Game  Flow  program  represents  a  high-level  tool  that  may  be  used  by  planners 
to  simulate,  and  obtain  an  optimal  strategy  for  an  adversarial  game  in  which  two  forces,  say  the  blue  and 
red  forces,  separately  control  a  large  number  of  units  distributed  over  a  large  geographical  area. 

The  geographical  area  in  our  model  is  represented  by  an  abstract  game  board  that  is  divided  into 
cells  so  that  the  strength  concentration  of  the  blue  (resp.  red)  force  in  a  ceil  is  defined  as  the  number  of 
blue  (resp.  red)  units  contained  in  the  cell  divided  by  the  area  of  the  cell.  The  game  is  concurrent  in  the 
sense  that  both  the  blue  and  red  forces  can  move  some  or  all  of  their  respective  units  simultaneously  and 
continuously  during  the  game. 
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Movement  of  units  is  achieved  by  specifying  transport  velocities  for  each  pair  of  contiguous  cells,  i.e., 
the  rate  at  which  the  units  are  shifted  from  one  cell  to  the  next.  At  the  start  of  a  game,  the  two  forces 
are  assigned  an  initial  strength  distribution  over  the  cells  in  the  game  board.  As  the  game  proceeds,  the 
respective  strength  distributions  evolve  in  different  ways,  but  the  total  strength  of  each  force  can  only 
decrease  due  to  attrition  caused  by  enemy  attacks,  mechanical  breakdown,  etc. 

The  game  is  carried  out  for  a  specified  amount  of  time,  with  the  three  processes  of  the  game,  i.e., 
unit  movement,  attack  and  attrition  (other  than  that  associated  with  attack),  evolving  uninterrupted  in 
parallel  for  the  duration  of  the  game. 

The  goal  of  each  force  is  to  reach  the  end  of  the  game  with  a  minimum  loss  of  their  own  strength,  while 
inflicting  maximum  damage  to  the  opposing  force.  Also,  each  force  may  assign  higher  payoff  values  (larger 
weights)  to  some  of  the  cells  in  the  game  board  than  to  other  cells,  so  a  higher  score  might  be  earned 
by  finishing  the  game  with  heavier  strength  concentration  in  more  valuable  cells.  Finally,  movement  of 
units  across  the  game  board  typically  costs  valuable  energy,  so  the  energy  expenditure  for  movement  is 
also  included  in  the  payoff  function  for  each  force. 

The  control  strategy  of  each  force  is  defined  as  the  transport  velocities  that  a  force  assigns  to  all  the 
pairs  of  adjacent  cells  in  the  game  board  during  a  game.  To  compute  the  strategy  of  each  force,  we  use 
a  game- theoretic  solution  engine  (i.e.,  the  Sequential  Linear-Quadratic  algorithm)  [4]  that  is  designed 
to  converge  to  a  Nash  solution  of  the  game. 

19.3  Mathematical  Model 

In  order  to  derive  equations  which  model  the  evolution  for  the  strength  distributions  of  the  two  forces  on 
the  game  board,  we  use  an  Eulerian  approach  (see,  e.g.,  [3]). 

The  game  board  is  divided  into  a  grid  with  uniform  mesh  size  Ax  x  Ay.  The  evolution  of  the  strength 
concentration  of  the  blue  force  in  a  cell  can  be  computed  by  balancing  the  net  inflow  of  blue  units  through 
the  boundaries  of  the  cell  minus  the  loss  of  units  due  to  attrition. 

The  superscript  B  (resp.  R)  is  affixed  to  parameters  and  variables  that  belong  to  the  blue  (resp.  red) 
force.  The  coordinates  of  a  typical  cell  in  the  grid  are  indicated  by  the  subscripts  run.  The  strength 
concentration  of  the  blue  force  inside  the  mn- th  cell  at  time  t  is  represented  by  the  symbol  p^n  ,  and 
corresponds  to  the  number  of  blue  units  in  the  cell  divided  by  the  mesh  area  Ax  Ay.  The  symbol  /x^n 
represents  the  transport  velocity  in  the  x  direction  for  the  blue  units  from  the  mn-th  to  the  (m  +  l)n-th 
cell.  The  symbol  represents  the  transport  velocity  in  the  y  direction  for  the  blue  units  from  the  mn- 
th  to  the  m(n  +  l)~th  cells.  Similarly,  the  symbols  and  represent  the  transport  velocity 

for  the  blue  units  from  the  (m  —  l)n-th  and  m(n  —  l)-th  cells,  respectively,  to  the  mn-th  cell.  Here  the 
transport  velocities,  p  and  z/,  represent  distance  travelled  per  length  of  the  cells  per  unit  of  time. 

The  net  inflow  of  blue  units  into  the  mn—th  cell  can  be  written  as 

where  is  the  standard  Heaviside  function,  which  gives  a  value  of  one  for  a  positive  argument  and  zero 
for  a  negative  argument,  and  H~  is  defined  as  H~  =  1  —  .  Notice  that  the  strength,  p,  is  a  positive 

quantity  by  definition. 

The  attrition  process  due  to  itself  causes  a  decrease  in  strength  concentration  inside  the  cell,  propor¬ 
tional  to  the  instantaneous  value  of  the  strength  concentration  in  the  cell: 

pB 

^mnrrnn  i 

where  aBn  is  called  the  local  self- attrition  parameter  for  the  blue  units  in  the  mn-th  cell. 

The  additional  loss  of  strength  due  to  attack  from  nearby  red  units  occurs  at  a  rate  determined  by  the 
surface  convolution  of  an  efficiency-of- attack  function  and  the  strength  of  the  red  units  distributed 
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in  the  neighborhood: 


k,n-I.PklPmv 

k  l 

With  the  adoption  of  the  efficiency-of-attack  function,  it  will  be  possible  to  model  the  dependence  of 
attack  efficiency  on  the  distance  between  the  attacker  and  the  attacked. 

Adding  the  above  three  terms,  we  obtain  the  rate  of  increase  of  the  strength  concentration  of  the  blue 
force  inside  the  mn-th  cell: 

Pfnn  =  + 

+  (/>m-l,nW+[^- l,n]  +  PrnnH  l/-tm  - 1 ,71  1  ,n 

-  (p£n//>£„)+p£,„+itf  >£«!)!/«„ 

+  (Pm,n-l^+[l'm,n-l]  +  [^m.TX-l  D^m  ,n- 1 

,  B  B 
omn/?mn 

+  ^m-k^-lPklPnin'  (19.1) 

k  l 

This  equation  is  for  the  blue  force.  A  similar  equation  is  derived  for  the  red  force. 


19.4  Solution  of  the  System  Equations 

In  this  work  we  only  consider  rectangular  game  boards  where  there  is  no  flow  of  units  across  the  boundaries 
of  the  board  (Neumann  boundary  conditions).  Furthermore,  the  game  board  is  divided  into  rectangular 
cells  arranged  in  a  regular  grid  pattern,  and  the  strength  concentrations  for  the  two  forces,  for  each  cell, 
are  defined  at  the  center  of  the  cell.  Velocities  are  defined  on  the  boundaries  between  each  pair  of  adjacent 
cells.  Figure  19.1  illustrates  the  numbering  scheme  for  the  cells  in  a  4  x  4  grid. 


Figure  19.1:  Cell  numbering  scheme. 


Since  equations  similar  to  equation  (19.1)  can  be  formulated  for  each  cell  in  the  game  area,  we  need 
to  solve  a  system  of  nonlinear  differential  equations  with  the  following  general  form: 


P 

P 


B 

R 


(19,2) 
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To  construct  the  general  form  of  the  system  equations,  the  strength  concentrations  and  velocities  are 
assembled  as  vectors  as  follows: 

P  r  B  B  B  P  B  }f 

P  —  [Poo?  Pioi  •  •  ■ )  Pm-  1,0 1  Poi»  ■  •  •  ?  Pm-\,n -\\  > 

R  r  R  R  R  R  R  ]/ 

P  —  [Poo>  Pio>  •  •  •  >  Pm-  1,0 1  Poi  >  *  •  ■  i  Pm -\,n —\\  » 

B  T  B  B  B  B  B  y 

P  —  IPOO)  PlO?  *  •  *  J  PM-2,C>7  P01>  ■  *  •  )  PM-2,N-lJ  j 

R  r  R  R  R  R  R  y  /iq  q\ 

p  —  [Poo >  Pioi  •  •  •  i  Pm —2, Oi  Poi » •  •  •  i  Pm  ~2,n— i  j  j  ipy.o; 

B  r  B  B  B  B  B  i/ 

^  =  I^OOj  ^IOt  •  ■  •  i  ^Oli  *  •  ■  >  ^Af-l.N-2]  » 

R  t  R  R  R.  R  r  i/ 

V  —  I^OOj  ^10j  •  •  •  >  ^OIj  •  •  •  i  I/Af-l,JV~2j  * 

Notice  that  that  the  size  of  the  pB,pR  vectors  is  K  =  MN ,  while  the  size  of  the  pB,pR  vectors  is 
Li  ~  (M  -  1  )N  and  the  size  of  the  vB ,  vR  vectors  is  L2  =  M(N  -  1).  The  reason  for  the  different  vector 
sizes  is  that  the  normal  component  of  the  velocities  are  known  to  be  always  zero  at  the  boundaries  of 
the  game  area,  so  we  don’t  have  to  solve  for  these  terms.  Thus,  for  example,  the  mapping  from  the  cell 
position  index  (m,n)  to  the  vector  position  index  k  for  ( pB ,  pR)  is  given  by  k  =  M  x  n  -f  m,  where  k  is 
an  integer  number  between  0  and  M  xiV-1. 

In  general,  the  system  of  equations  (19.2)  represents  a  complicated  flow  field  where  the  velocities  can 
be  arbitrarily  chosen  by  the  controller.  For  example,  one  cell  may  act  at  one  moment  as  a  finite  source 
of  units  and  then  turn  into  a  sink  at  the  next  moment.  In  our  experiments  we  found  good  results  using 
a  fourth-order  Runge-Kutta  method  with  variable  step  size  (Matlab  ode  solver  rk45). 


19.5  Differential  Game 


Given  initial  strength  distributions  for  the  blue  and  red  forces,  we  wish  to  generate  strategies  for  the 
two  forces  so  that,  at  the  end  of  the  game,  each  force  has  achieved  an  optimum  balance  with  respect  to 
its  different  goals.  In  other  words,  we  seek  a  Nash  equilibrium  solution  to  a  zero-sum  differential  game 
defined  as  follows: 


J* 


(19.4) 


where  the  blue  force  tries  to  minimize  the  payoff  function  J  while  the  red  force  tries  to  maximize  the 
same  payoff  function.  The  Nash  equilibrium  value  J*  of  problem  (19.4),  if  it  exists,  is  called  the  value  of 
the  game. 

To  solve  (19.4)  we  assume  that  the  M2K-valued  variables,  p^(pB,pK),  are  continuous  functions  of  time 
defined  on  the  interval  [£o,£/].  We  also  assume  that  the  velocities,  p=(pB,pH)  G  R2Ll  and  u—(uB^R) 
G  R2L2 ,  are  continuous  functions  defined  on  the  same  interval. 


19.6  Solution  of  the  Differential  Game 

The  numerical  method  used  to  seek  a  local  Nash  equilibrium  solution  of  the  differential  game  defined 
by  equation  (19.4),  with  a  given  payoff  function  subject  to  the  dynamic  system  defined  by  (19.2),  is  the 
Sequential  Linear-Quadratic  (SLQ)  Algorithm  [4]. 

The  SLQ  algorithm  tries  to  find  a  local  Nash  equilibrium  solution  by  an  iterative  process  in  which 
the  system  differential  equation  is  linearized  around  the  2-th  solution  state  estimate,  Pi(pi,^i),  so  that 
the  2-th  step  consists  of  solving  the  following  game  subproblem: 

min  max  { J(pi  +  5p\  pi  +  5p,  Vi  +  5v)  | 

SpR 

5p(t)  =  Lp{pi(t),  Ui(t)]Sp(t)  (19.5) 

+Lp{pi(t),  p.i(t),Vi{t)}5p.(t) 

+Ll/{pi(t),p,i(t),vi(t)]5v(t),t  e  (to,t/];<5p(to)  =  0} 
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where  3{p\  p>  v)  =  J(pB,pR,  vB,vR)  is  assumed  to  be  a  quadratic  function.  We  use  Sp  and  6v  to  represent 
small  perturbations,  p  —  p%  and  v  -  v{)  of  the  respective  controls,  /i  and  v,  from  the  current  estimate, 
Pi  and  Vi .  And  Sp  represents  the  corresponding  perturbation,  p[pi  -F  Sp,  V{  -f  8 v)  -  p[pi ,  z^],  in  the  state 
trajectory. 

An  approximate  solution  to  the  state  trajectory  perturbation  Sp  subject  to  the  perturbed  inputs,  dp 
and  8v ,  can  be  obtained  by  linearizing  the  system  differential  equation  around  (pi,pi,Vi): 


&P=  Lp\pf ,  p? ,  n? ,  V? , 
-bL^[pf  ,pf,/xf  ,/xf, 
^Lu[pB,pR,pB,pf-, 


(19.6) 


The  current  solution  estimate  (/q,^)  can  then  be  updated  by 

pi+\  —  /ii  T  SiSpi  (19.7) 

Vi+i  —  Vi  +  SiSvi 


with  the  step  size  s*  >  0,  where  (Spi,  Svi)  denotes  the  Nash  solution  to  the  subgame  (19.5). 

The  original  nonlinear  system  (19.2)  is  solved  next  using  the  inputs  (/xi+i,i/i+i)  to  obtain  the  new 
trajectory  pi+\  =  p[pi+\,Vi+ 1].  Successive  iterations  proceed  in  the  same  way  as  before,  until  the  norm 
of  the  update  velocities  (5pi,8vi)  reduces  to  an  acceptable  level.  Recall  that  when  the  payoff  function 
is  quadratic,  the  computation  of  the  Nash  velocities,  (Spi,5vi),  for  the  above  subproblem  involves  the 
solution  of  a  system  of  Riccati  differential  equations. 


19.7  Experimental  Results 

The  following  two  experiments  illustrate  the  utility  of  the  Game  Flow  program. 

19.7.1  Experiment  1 

The  game  area  is  a  square  of  unit  length  in  each  side,  and  is  divided  into  64  squares  to  form  an  8  x  8 
grid. 

The  game  board  may  represent  some  geographical  area  where  the  conflict  takes  place.  It  should  be 
expected  then,  that  certain  local  features  of  the  game  area  will  have  an  effect  on  the  evolution  of  the 
game.  For  example,  the  energy  that  the  forces  must  spend  to  move  their  respective  assets  should  vary 
as  they  attempt  to  go  across  different  types  of  terrain:  dessert  dunes,  marshy  land,  dense  forests,  etc. 
Similarly,  different  features  of  the  game  area  might  affect  the  attrition  rate  sustained  by  the  forces. 

In  this  experiment,  the  game  area  consists  of  two  types  of  terrain:  one  smooth  area,  through  which 
movement  of  assets  is  relatively  easy,  surrounded  by  more  difficult  terrain.  The  type  of  terrain  is  reflected 
on  the  running  cost  on  velocity  as  shown  in  Figure  19.2,  where  the  smooth  area  is  indicated  in  dark  color. 
Notice  that,  for  example,  to  go  from  cell  (4,4)  to  cell  (5,5),  the  path  that  goes  through  cell  (5,4)  is  less 
expensive  than  the  path  that  goes  through  cell  (4,5). 


333 


Terminal  Payoff  on  Strength 

Running  Cost  on  Strength 
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Figure  19.2:  Game  Board  (clockwise  starting  from  top  left  figure:  terminal  payoff  associated  with  final 
values  of  the  strength  concentration  in  the  cells;  running  cost  associated  with  instantaneous  values  of  the 
strength  concentration  in  the  cells;  local  attrition;  and  running  cost  on  velocity).  Darker  shade  indicates 
higher  value. 
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Biue  Initial  Strength  Distribution 


Figure  19.3:  Initial  strength  distribution  of  the  blue  force. 


Red  Initial  Strength  Distribution 


Figure  19.4:  Initial  strength  distribution  of  the  red  force. 
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To  simplify  visualization  of  the  experiment  results,  attrition  not  caused  by  the  enemy  is  not  included 
in  this  experiment  (note  that  in  Figure  19.2  the  local  attrition  distribution  is  uniformly  zero). 

The  blue  force  has  an  initial  strength  of  2  units  spread  uniformly  on  the  game  area,  as  shown  in  Figure 
19.3.  The  red  force  has  an  initial  strength  of  3  units,  also  spread  uniformly  on  the  game  area,  as  shown 
in  Figure  19.4. 

We  assume  that  each  force  has  a  symmetric  attack  efficiency  function  as  depicted  in  Figure  19.5.  The 
Figure  shows  that  the  red  force  is  slightly  more  powerful  than  the  blue  force  at  close  range.  On  the  other 
hand,  the  blue  force  has  a  longer  range. 

The  objective  for  the  blue  (resp.  red)  force  consists  of  the  following  five  goals:  (a)  reach  the  end  of 
the  game  with  as  much  strength  as  possible;  (b)  place  as  many  units  as  possible  in  the  four  cells  located 
in  the  middle  of  the  game  board;  (c)  remove  the  red  (resp.  blue)  units  from  the  four  central  cells,  and 
block  any  attempts  by  the  red  (resp.  blue)  force  to  move  its  own  units  into  that  area;  (d)  continuously 
try  to  minimize  the  strength  of  the  red  (resp.  blue)  force;  and  (e)  minimize  the  energy  expenditure  in 
accomplishing  the  first  four  goals  of  this  mission  statement. 

While  the  respective  mission  statements  for  the  blue  and  red  forces  may  look  identical,  each  force  can 
assign  different  priorities  (weights)  to  the  different  goals.  For  instance,  in  this  experiment,  the  weight  for 
the  blue  force  associated  with  movement  of  units  over  the  smooth  terrain  of  the  game  board  is  equivalent 
to  three  fourths  the  corresponding  weight  for  the  red  force.  This  means  that  the  blue  force  has  more 
freedom  of  movement  over  the  game  board  than  the  red  force. 
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Figure  19.5:  Efficiency  of  attack  for  the  blue  and  red  forces. 


The  overall  objective  of  the  game  is  defined  by  the  following  quadratic  payoff  function  (also  see  Figure 
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19.2): 


J{pB,p.R,vB,vR)  = 

J\\pB(t)'[QBB  -QBR}pB(t) 

+  \pR(t)'[QRB  -  QRR)pR(t)  (19.8) 

+  \p,B{t)'\RB\pB{t)  +  \»B(t)'[RS]vB(t) 

+  \pR(t)'[RR}pR(t)  +  \uR(t)'[R^}uR(t)}dt 
+  \pB(tf)'[Qf}pB(tf)  +  \pR(tf)'{Q?}pR(tf), 

where  all  the  weighting  matrices  are  diagonal.  The  elements  of  QBB  e  RKxK ,  QBR  €  RKxK ,  QRB  € 
RKxiC  and  QRR  6  RKxK  are  chosen  in  accordance  with  the  goals  (b)-(d)  of  each  force  as  defined  above; 
the  elements  of  Rf  e  RLlxLl,  RR  e  RLlxLl,  RB  e  RL2xL2  and  R$  e  RLaxLl  reflect  goal  (e);  and  the 
elements  of  Qf  6  RKxK  and  QR  €  RKxK  correspond  to  the  terminal  cost  associated  with  goal  (a). 

The  solution  technique  (SLQ  method)  [4]  is  iterative  and  it  improves  the  current  solution  estimate  at 
each  iteration.  Hence,  to  solve  the  game,  an  initial  guess  has  to  be  made  for  the  strategy  used  by  the 
forces.  In  this  experiment,  the  initial  choice  of  strategy  affected  the  rate  of  convergence  in  the  iterative 
solution  of  the  game,  but  it  did  not  have  significant  effects  on  the  final  outcome  of  the  game.  So,  we 
have  arbitrarily  assigned  a  constant  velocity  distribution  such  that  the  units  of  the  blue  and  red  forces 
converge  in  the  middle  of  the  game  board. 

The  value  of  the  game  corresponding  to  the  initial  solution  estimate  is  shown  in  Table  19.1,  broken 
into  the  individual  cost  components. 

Table  19.1:  Payoff  function  value  for  the  initial  solution  estimate 


Force 

Jp 

Jfp 

J 

Blue 

17600 

18800 

-353.4 

-1273.6 

35534 

Red 

-20800 

-21400 

647.5 

1287.8 

-40264 

Total 

-730 

The  SLQ  algorithm  was  used  next  to  find  a  Nash  equilibrium  solution  for  the  game.  In  this  experiment, 
the  solver  was  stopped  after  70  iterations,  when  the  error  (i.e.,  the  norm  \\(8ni,8vi)\\  of  the  velocity 
updates)  was  approximately  0.8  percent  of  the  original  error.  At  this  error  level,  further  iterations  had 
no  significant  effects  on  the  solution. 

The  value  of  the  game  corresponding  to  the  Nash  equilibrium  solution  is  shown  in  Table  19.2,  broken 
into  the  individual  cost  components,  corresponding  to  the  running  costs  on  velocity  and  strength,  and 
terminal  cost  on  strength. 

Table  19.2:  Payoff  function  value  for  the  Nash  equilibrium  solution 


Force 

h 

Ju 

Jp 

JfP 

J 

Blue 

27.6 

23.6 

-259.3 

-584.1 

-792.2 

Red 

-40.4 

-38.3 

490.9 

1110.0 

1522.2 

Total 

730.0 

Clearly,  the  Nash  equilibrium  solution  found  by  the  SLQ  algorithm,  greatly  improves  the  performance 
of  the  two  forces  in  terms  of  the  value  of  the  payoff  function  selected  for  this  experiment.  Recall  that  the 
blue  force  tries  to  minimize  the  total  payoff  while  the  red  force  tries  to  maximize  it. 
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Figures  19.6  and  19.7  show  the  initial  Nash  strength  distributions  for  the  blue  and  red  forces.  The 
distributions  are  identical  and  uniform,  so  the  shade  (color)  is  uniform  over  the  area.  The  direction  of 
unit  movement  across  the  border  between  each  pair  of  cells  is  indicated  by  an  arrow.  The  size  of  each 
arrow  indicates  the  magnitude  of  an  initial  velocity  component. 


Blue  Initial  Strength  Distribution 
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Figure  19.6:  Initial  Nash  Strength  Distribution  for  Blue  force.  Arrows  indicate  magnitudes  of  the 
velocity  components  across  the  boundaries. 


Red  Initial  Strength  Distribution 
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Figure  19.7:  Initial  Nash  Strength  Distribution  for  Red  force.  Arrows  indicate  magnitudes  of  the  velocity 
components  across  the  boundaries. 
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Figures  19.8  and  19.9  show  the  final  Nash  strength  distributions  for  the  blue  and  red  forces.  The 
arrows  indicate  the  magnitudes  of  the  velocity  components  as  the  respective  units  of  the  two  forces 
reached  their  final  destinations. 


Blue  Final  Strength  Distribution 


Figure  19.8:  Final  Nash  Strength  Distribution  for  Blue  force.  Arrows  indicate  magnitudes  of  the  velocity 
components  across  the  boundaries. 


Figure  19.9:  Final  Nash  Strength  Distribution  for  Red  force.  Arrows  indicate  magnitudes  of  the  velocity 
components  across  the  boundaries. 
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The  final  total  strength  for  the  blue  force  was  0.32,  while  the  final  total  strength  for  the  red  force  was 
0.46.  Therefore,  the  red  force  conserved  only  15.4%  of  its  original  strength,  while  the  blue  force  conserved 
16.1%  of  its  own  strength. 

Qualitatively  speaking,  we  can  say  that,  in  this  scenario,  the  superiority  of  the  blue  force  in  the  attack 
range  prevailed,  allowing  the  blue  units  to  keep  the  red  units  out  of  the  most  valuable  cells  in  the  center 
of  the  board.  This  can  be  better  appreciated  in  Figures  19.10  and  19.11.  Indeed,  at  the  end  of  the  game, 
the  red  units  were  forced  to  retreat  into  the  four  corners  of  the  game  board  where  less  valuable  cells  were 
to  be  found.  It  is  also  clear  that  different  transportation  costs  assigned  to  different  regions  affected  the 
way  in  which  the  blue  and  red  forces  adjusted  their  strength  concentrations  during  the  game.  This  can 
be  seen  in  the  fact  that  the  red’s  final  concentration  in  the  (8,1)  comer  is  weaker  than  the  other  three 
corners  because  the  terrain  near  (8,1)  is  harder  to  traverse. 

Blue  Final  Strength  Distribution 
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Figure  19.10:  Final  Nash  Strength  Distribution  for  Blue  force. 
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Red  Final  Strength  Distribution 
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Figure  19.11:  Final  Nash  Strength  Distribution  for  Red  force. 


19.7.2  Experiment  2 

The  game  board  is  a  square  of  unit  length  in  each  side,  and  is  divided  into  36  square  cells,  which  form 
an  6  x  6  grid. 

We  assume  that  the  two  forces  assign  equal  value  to  each  cell,  so  that  a  common  map  of  the  game 
board  showing  the  real  estate  value  is  shown  for  the  two  forces  in  Figure  19.12.  With  respect  to  the 
running  cost  on  velocity,  Figure  19.12  shows  that  the  light  cells  form  a  path  of  smooth  terrain,  through 
which  movement  of  units  is  relatively  easy,  hence  units  spend  relatively  low  energy  when  moving  through 
light  cells;  the  dark  cells  represent  more  difficult  terrain.  Figure  19.12  also  shows  that  there  is  local 
attrition  associated  with  the  terrain.  Units  that  move  into  high  attrition  cells  will  suffer  loss  of  strength 
even  in  the  absence  of  enemy  attack. 

The  blue  force  has  an  initial  strength  of  1/6  units  spread  uniformly  on  the  six  bottom  right  cells  of 
the  game  board,  as  shown  in  Figure  19.13.  The  red  force  has  an  initial  strength  of  1/6  units  spread 
uniformly  on  the  six  top  left  cells  of  the  game  board  as  shown  in  Figure  19.14. 

We  assume  that  each  force  has  a  symmetric  attack  efficiency  function  as  depicted  in  Figure  19.15. 
The  figure  shows  that  the  blue  force  is  more  powerful  than  the  red  force  at  close  range.  On  the  other 
hand,  the  red  force  has  a  longer  range,  covering  an  area  of  approximately  3x3  cells. 

The  objective  for  the  blue  force  consists  of  the  following  three  goals:  (a)  reach  the  end  of  the  game 
with  as  much  strength  as  possible;  (b)  place  as  many  units  as  possible  in  the  most  valuable  cells  located 
in  the  top  right  corner  of  the  game  board;  (c)  minimize  the  energy  expenditure  in  accomplishing  the  first 
two  goals  of  this  mission  statement. 

The  objective  for  the  red  force  consists  of  the  following  three  goals:  (a)  reach  the  end  of  the  game  with 
as  much  strength  as  possible;  (b)  block  the  blue  forces  as  they  try  to  move  towards  the  most  valuable 
cells  located  in  the  top  right  corner  of  the  game  board;  and  (c)  minimize  the  energy  expenditure  in 
accomplishing  the  first  two  goals  of  this  mission  statement. 
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Terminal  Payoff  on  Strength 
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Figure  19.12:  Game  Board  (clockwise  starting  from  top  left  figure:  terminal  payoff  associated  with 
final  values  of  the  strength  concentration  in  the  cells;  running  cost  associated  with  instant  values  of  the 
strength  concentration  in  the  cells;  local  attrition;  and  running  cost  on  velocity).  Darker  shade  indicates 
higher  value. 
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Blue  Initial  Strength  Distribution 


Figure  19.13:  Initial  strength  distribution  of  the  blue  force. 


Red  Initial  Strength  Distribution 


0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 


Figure  19.14:  Initial  strength  distribution  of  the  red  force. 
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Blue  Efficiency  of  Attack 


Red  Efficiency  of  Attack 
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Figure  19.15:  Efficiency  of  attack  for  the  blue  and  red  forces. 


The  overall  objective  of  the  game  is  defined  by  the  following  quadratic  payoff  function: 

J{p.B,HR,vB,vR)  = 

[’ {\pB{tj\QBB  -QBR]PB{t) 

+\pR(t)'[QRB  -  QRR]pR(t)  (19.9) 

+  \nB(t)'[Rf]^(t)  +  \vB(t)'[R%}vB(t) 

+  +  \vR(t)'{R?}vR(t)}dt 

+  \pB(tf)'[Qf]pB(t,)  +  \pR(tf)'[Qf}pR(tf), 

where  all  the  weighting  matrices  are  diagonal.  The  elements  of  QBB  £  RKxK ,  QBR  £  RKxK ,  QRB  £ 
RKxK  and  Qrr  £  ^KxK  are  chosen  in  accordance  with  the  goals  (b)-(d)  of  each  force  as  defined  above; 
the  elements  of  Rf  £  RLixLi,  Rr  £  ELixLi,  R%  £  ML2xL2  and  Rr  £  RL2xLi  reflect  goal  (e);  and  the 
elements  of  QB  £  RKxK  and  QR  £  RKxK  correspond  to  the  terminal  cost  associated  with  goal  (a). 

The  solution  technique  (SLQ  method)  [4]  is  iterative  and  it  improves  the  current  solution  estimate  at 
each  iteration.  Hence,  to  solve  the  game,  an  initial  guess  has  to  be  made  for  the  strategy  used  by  the 
forces.  The  initial  strategy  for  blue  consisted  in  following  the  path  of  smooth  terrain  at  constant  velocity 
throughout  the  duration  of  the  game.  The  initial  strategy  for  red  consisted  on  marching  one  row  of  cells 
forward  in  order  to  block  the  smooth  path  using  all  its  strength.  In  this  particular  experiment,  the  initial 
choice  of  strategy  was  critical  in  attaining  convergence  in  the  iterative  solution  of  the  game. 

The  value  of  the  game  corresponding  to  the  initial  solution  estimate  is  shown  in  Table  19.3,  broken 
into  the  individual  cost  components. 

The  SLQ  algorithm  was  used  next  to  find  a  Nash  equilibrium  solution  for  the  game.  In  this  experiment, 
the  solver  was  stopped  after  530  iterations.  Figure  19.16  shows  the  evolution  of  the  error  (i.e.,  the  norm 
|| (<5/ii,  of  the  velocity  updates)  as  the  algorithm  converges.  Note  that  the  step  size  had  to  be  reduced 
from  0.2  to  0.05  after  20  iterations  in  order  to  achieve  convergence. 
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Table  19.3:  Payoff  function  value  for  the  initial  solution  estimate 


Force 

~ h  1 

Ju 

■h 

h<> 

J 

Blue 

3146080 

6350400 

-59796 

-763535 

8673149 

Red 

-96000 

0 

16.5 

9.8 

-95974 

Total 

8577176 

Figure  19.16:  Convergence  of  the  SLQ  algorithm. 


The  value  of  the  game  corresponding  to  the  Nash  equilibrium  solution  is  shown  in  Table  19.4,  broken 
into  the  individual  cost  components,  corresponding  to  the  running  costs  on  velocity  and  strength,  and 
terminal  cost  on  strength. 

Clearly,  the  Nash  equilibrium  solution  found  by  the  SLQ  algorithm,  greatly  improves  the  performance 
of  the  blue  force  in  terms  of  the  value  of  the  payoff  function  selected.  That  the  Nash  equilibrium  solution 
also  improves  the  performance  of  the  red  force  is  not  so  clear.  However,  when  comparing  with  the  initial 
solution  estimate,  one  should  realize  that  the  reduction  observed  in  the  terminal  cost  associated  with  the 
strength  of  the  blue  force  is  entirely  due  to  the  more  effective  deployment  of  the  red  force  corresponding 
to  the  Nash  equilibrium  found  by  the  SLQ  algorithm. 

Figures  19.17  and  19.18  show  the  initial  Nash  strength  distributions  for  the  blue  and  red  forces.  The 
direction  of  unit  movement  across  the  border  between  each  pair  of  cells  is  indicated  by  an  arrow.  The 
size  of  each  arrow  indicates  the  magnitude  of  the  initial  velocity  component. 

Figures  19.19  and  19.20  show  the  final  Nash  strength  distributions  for  the  blue  and  red  forces.  The 
arrows  indicate  the  magnitudes  of  the  velocity  components  as  the  respective  units  of  the  two  forces 
reached  their  final  destinations. 

To  aid  visualization  of  the  results,  Figures  19.21  and  19.22  show  a  three  dimensional  view  of  the  final 
Nash  strength  distributions  for  the  blue  and  red  forces. 

Another  view  of  the  results  is  shown  in  Figure  19.23.  This  shows  how  the  respective  strength  distri- 
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Blue  initial  Strength  Distribution 


Figure  19.17:  Initial  Nash  Strength  Distribution  for  Blue  force.  Arrows  indicate  magnitudes  of  the 
velocity  components  across  the  boundaries. 


Figure  19.18:  Initial  Strength  Distribution  for  Red  force.  Arrows  indicate  magnitudes  of  the  velocity 
components  across  the  boundaries. 
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Blue  Final  Strength  Distribution 


Figure  19.19:  Final  Nash  Strength  Distribution  for  Blue  force.  Arrows  indicate  magnitudes  of  the 
velocity  components  across  the  boundaries. 


Red  Final  Strength  Distribution 
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Figure  19.20:  Final  Nash  Strength  Distribution  for  Red  force.  Arrows  indicate  magnitudes  of  the  velocity 
components  across  the  boundaries. 
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Table  19.4:  Payoff  function  value  for  the  Nash  equilibrium  solution 


Force 

Running 

Running 

Ju 

Running 

JP 

Terminal 

Jfp 

Game 

J 

Blue 

101086 

185527 

-13027 

-702722 

-429135 

Red 

-103863 

0 

7.46 

2.45 

-103853 

Total 

-532988 

butions  change  over  time.  The  figure  shows  snapshots  of  the  strength  distributions  for  the  blue  and  red 
forces  at  the  beginning  and  the  end  of  the  game,  and  at  two  intermediate  points. 


Figure  19.23:  Progression  of  the  instantaneous  Nash  Strength  Distributions  for  Blue  and  Red  forces. 
Arrows  indicate  magnitudes  of  the  velocity  components  across  the  boundaries. 

The  results  of  this  experiment  are  in  good  agreement  with  what  one  might  expect  given  the  game 
scenario.  Consider,  for  example,  the  strategy  selected  by  the  red  force.  Indeed,  if  we  recall  that  the  red 
units  have  an  attack  range  of  three  cells  while  the  blue  units  have  an  attack  range  of  only  one  cell,  one 
can  appreciate  the  strategy  selected  by  the  red  force,  namely  to  attack  blue  from  a  safe  distance  with  the 
bulk  of  its  strength,  while  spending  its  remaining  force  to  block  the  passage  of  the  blue  force  through  the 
smooth  pathway. 


19.8  Conclusions 

It  was  difficult  to  find  a  scenario  which  was  both  interesting,  in  the  sense  that  some  significant  amount 
of  action  occurred  during  the  game,  and  in  which  a  game  theoretic  solution  could  be  found  by  the  SLQ 
algorithm.  The  main  issue  was  how  to  select  weights  appropriately,  given  a  particular  scenario.  The 
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choice  of  the  initial  guess  for  the  controls  was  also  important  in  some  cases.  When  the  two  forces  had  an 
initial  strength  distribution  spread  uniformly  across  the  whole  game  area,  it  was  easier  to  find  a  solution 
as  compared  with  the  case  when  the  initial  strength  distributions  were  concentrated  in  small  regions  of 
the  game  area.  However,  it  is  still  not  clear  at  this  time  what  features  are  most  critical  in  determining 
whether  a  given  scenario  has  an  SLQ  solution  or  not. 

With  respect  to  computational  complexity,  the  Game  Flow  solution  engine  executes  relatively  fast, 
using  a  combination  of  Matlab  built-in  functions  (e.g.,  ODE  solvers)  and  custom  made  C++  routines.  For 
example,  the  CPU  time  required  to  run  a  single  iteration  in  one  of  the  experiments  (8x8  grid)  was  less 
than  ten  seconds.  Still,  it  must  be  mentioned  that  in  some  experiments  it  took  hundreds  of  iterations  to 
attain  convergence  of  the  SLQ  algorithm.  The  experimental  results  in  this  report  were  obtained  running 
the  Game  Flow  program  on  a  800  MHz  PC  with  500  MB  RAM. 

In  the  current  version  of  the  Game  Flow  model,  physical  features  of  the  theater  itself  can  affect  the 
dynamics  in  two  ways:  i)  in  the  rate  of  attrition  of  the  human  and/or  mechanical  assets  in  the  field;  ii)  in 
the  energy  cost  associated  with  the  movements  of  assets  in  the  field.  The  last  feature  is  not  implemented 
directly  in  the  the  differential  equations  of  the  system.  Instead,  it  is  represented  in  the  payoff  functions  of 
the  two  forces.  A  velocity  reduction  parameter,  similar  to  an  attrition  parameter  could  be  implemented 
in  future  versions  of  the  Game  Flow  model. 

Another  characteristic  of  the  current  version  of  the  Game  Flow  model  is  that  physical  features  of  the 
theater  have  no  effect  on  the  efficiency  of  attack.  Hence,  the  enemy  cannot  hide  behind  a  mountain  range, 
for  instance.  Also  the  efficiency  of  attack  functions  can  have  a  directionality  associated  with  them  before 
the  beginning  of  a  game,  but  these  cannot  be  rotated  or  reoriented  as  the  game  evolves.  This  could  be 
an  important  addition  to  enhance  the  strategic  capabilities  of  the  Game  Flow  model. 

It  is  hoped  that  by  adding  complexity  to  the  model,  the  class  of  interesting  problems  that  can  be 
solved  with  the  Game  Flow  program  would  be  enlarged. 
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